Iterate through rows of grouped pandas dataframe to create new columns - python
I'm new to Python and am trying to get to grips with Pandas for data analysis.
I wondered if anyone can help me loop through rows of grouped data in a dataframe to create new variables.
Suppose I have a dataframe called data, that looks like this:
+----+-----------+--------+
| ID | YearMonth | Status |
+----+-----------+--------+
| 1 | 201506 | 0 |
| 1 | 201507 | 0 |
| 1 | 201508 | 0 |
| 1 | 201509 | 0 |
| 1 | 201510 | 0 |
| 2 | 201506 | 0 |
| 2 | 201507 | 1 |
| 2 | 201508 | 2 |
| 2 | 201509 | 3 |
| 2 | 201510 | 0 |
| 3 | 201506 | 0 |
| 3 | 201507 | 1 |
| 3 | 201508 | 2 |
| 3 | 201509 | 3 |
| 3 | 201510 | 4 |
+----+-----------+--------+
There are multiple rows for each ID, MonthYear is of the form yyyymm, and Status is the status at each MonthYear (takes values 0 to 6)
I have manged to create columns to show me the cumulative maximum status, and an ever3 (to show me if an ID has ever had a status or 3 or more regardless of current status) indicator like this:
data1['Max_Stat'] = data1.groupby(['Custno'])['Status'].cummax()
data1['Ever3'] = np.where(data1['Max_Stat'] >= 3, 1, 0)
What I would also like to do, is create the other columns to create metrics such as the number of times something has happened, or how long since an event. For example
Times3Plus : To show how many times the ID has had a status 3 or more at that point in time
Into3 : Set to Y the first time the ID has a status of 3 or more (not for subsequent times)
+----+-----------+--------+----------+-------+------------+-------+
| ID | YearMonth | Status | Max_Stat | Ever3 | Times3Plus | Into3 |
+----+-----------+--------+----------+-------+------------+-------+
| 1 | 201506 | 0 | 0 | 0 | 0 | |
| 1 | 201507 | 0 | 0 | 0 | 0 | |
| 1 | 201508 | 0 | 0 | 0 | 0 | |
| 1 | 201509 | 0 | 0 | 0 | 0 | |
| 1 | 201510 | 0 | 0 | 0 | 0 | |
| 2 | 201506 | 0 | 0 | 0 | 0 | |
| 2 | 201507 | 1 | 1 | 0 | 0 | |
| 2 | 201508 | 2 | 2 | 0 | 0 | |
| 2 | 201509 | 3 | 3 | 1 | 1 | Y |
| 2 | 201510 | 0 | 3 | 1 | 1 | |
| 3 | 201506 | 0 | 0 | 0 | 0 | |
| 3 | 201507 | 1 | 1 | 0 | 0 | |
| 3 | 201508 | 2 | 2 | 0 | 0 | |
| 3 | 201509 | 3 | 3 | 1 | 1 | Y |
| 3 | 201510 | 4 | 4 | 1 | 2 | |
+----+-----------+--------+----------+-------+------------+-------+
I can do this quite easily in SAS, using BY and RETAIN statements, but can't work out how to replicate this in Python.
I have managed to do this without iterating over each row, as I'm not sure what I was trying to do was possible. I had wanted to set up counters or indicators at group level,as is possible in SAS, and modify these row by row. Eg something like
Times3Plus=0
if row['Status'] >= 3:
Times3Plus += 1
Return Times3Plus
In the end, I created a binary 3Plus indicator
data['3Plus'] = np.where(data1['Status'] >= 3, 1, 0)
Then used groupby to summarise these to create Times3Plus at group level
data['Times3Plus'] = data.groupby(['ID'])['3Plus'].cumsum()
Into3 could then be populated using a function
def into3(row):
if row['3Plus'] == 1 and row['Times3Plus'] == 1: #i.e it is the first time
return 1
data['Into3'] = data.apply(into3, axis = 1)
Related
Is there a method in turning user input into csv format?
This is the example data that would be pasted into an input() prompt and ideally I would like it to be processed and made into a csv file through python: ,,,,,,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Expected,Expected,Expected,SCA,SCA,Passes,Passes,Passes,Passes,Carries,Carries,Dribbles,Dribbles,-additional Player,#,Nation,Pos,Age,Min,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Press,Tkl,Int,Blocks,xG,npxG,xA,SCA,GCA,Cmp,Att,Cmp%,Prog,Carries,Prog,Succ,Att,-9999 Gabriel Jesus,9,br BRA,FW,25-124,82,0,0,0,0,1,0,0,0,40,13,1,1,0,0.1,0.1,0.0,4,0,20,27,74.1,2,33,1,4,5,b66315ae Eddie Nketiah,14,eng ENG,FW,23-067,8,0,0,0,0,0,0,0,0,6,2,0,0,0,0.0,0.0,0.1,2,0,4,4,100.0,1,4,1,0,0,a53649b7 Martinelli,11,br BRA,LW,21-048,90,1,0,0,0,2,1,0,0,38,21,0,2,1,0.6,0.6,0.1,1,0,24,28,85.7,1,34,5,3,4,48a5a5d6 Bukayo Saka,7,eng ENG,RW,20-334,90,0,0,0,0,3,0,0,0,52,23,3,0,3,0.2,0.2,0.0,2,1,24,36,66.7,2,37,8,2,2,bc7dc64d Martin Ødegaard,8,no NOR,AM,23-231,89,0,0,0,0,2,0,0,0,50,22,2,1,2,0.1,0.1,0.0,2,0,30,39,76.9,5,28,3,1,2,79300479 Albert Sambi Lokonga,23,be BEL,CM,22-287,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0.0,0.0,0.0,0,0,1,1,100.0,0,1,1,0,0,1b4f1169 Granit Xhaka,34,ch SUI,DM,29-312,90,0,0,0,0,0,0,1,0,60,5,0,2,3,0.0,0.0,0.0,4,0,42,49,85.7,6,32,2,0,0,e61b8aee Thomas Partey,5,gh GHA,DM,29-053,90,0,0,0,0,1,0,0,0,62,25,7,1,2,0.1,0.1,0.0,0,0,40,47,85.1,5,26,4,0,1,529f49ab Oleksandr Zinchenko,35,ua UKR,LB,25-233,82,0,1,0,0,1,1,0,0,64,16,3,3,1,0.0,0.0,0.3,2,1,44,54,81.5,6,36,5,0,0,51cf8561 Kieran Tierney,3,sct SCO,LBWB,25-061,8,0,0,0,0,0,0,0,0,6,1,0,0,0,0.0,0.0,0.0,0,0,2,4,50.0,0,1,0,0,0,fce2302c Gabriel Dos Santos,6,br BRA,CB,24-229,90,0,0,0,0,0,0,0,0,67,5,1,1,2,0.0,0.0,0.0,0,0,52,58,89.7,1,48,3,0,0,67ac5bb8 William Saliba,12,fr FRA,CB,21-134,90,0,0,0,0,0,0,0,0,58,3,1,2,2,0.0,0.0,0.0,0,0,42,46,91.3,1,35,1,0,0,972aeb2a Ben White,4,eng ENG,RB,24-301,90,0,0,0,0,0,0,1,0,61,22,7,4,5,0.0,0.0,0.1,1,0,29,40,72.5,5,25,2,1,1,35e413f1 Aaron Ramsdale,1,eng ENG,GK,24-083,90,0,0,0,0,0,0,0,0,33,0,0,0,0,0.0,0.0,0.0,0,0,24,32,75.0,0,21,0,0,0,466fb2c5 14 Players,,,,,990,1,1,0,0,10,2,2,0,599,158,25,17,21,1.1,1.1,0.5,18,2,378,465,81.3,35,361,36,11,15,-9999 The link to the table is: https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League#stats_18bb7c10_summary I have attempted to use pandas dataframe but I am only able to export the first row of headers and nothing else (only the items before player).
Would have been nice for you to include your attempt. Pandas works just fine: import pandas as pd url = 'https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League#stats_18bb7c10_summary' df = pd.read_html(url)[10] cols = [f'{each[0]}_{each[1]}' if 'Unnamed' not in each[0] else f'{each[1]}' for each in df.columns] df.columns = cols df.to_csv('output.csv', index=False) Output: print(df.to_markdown()) | | Player | # | Nation | Pos | Age | Min | Gls | Ast | PK | PKatt | Sh | SoT | CrdY | CrdR | Touches | Press | Tkl | Int | Blocks | xG | npxG | xA | SCA | GCA | Cmp | Att | Cmp% | Prog | Carries | Prog.1 | Succ | Att.1 | |---:|:---------------------|----:|:---------|:------|:-------|------:|------:|------:|-----:|--------:|-----:|------:|-------:|-------:|----------:|--------:|------:|------:|---------:|-----:|-------:|-----:|------:|------:|------:|------:|-------:|-------:|----------:|---------:|-------:|--------:| | 0 | Gabriel Jesus | 9 | br BRA | FW | 25-124 | 82 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 40 | 13 | 1 | 1 | 0 | 0.1 | 0.1 | 0 | 4 | 0 | 20 | 27 | 74.1 | 2 | 33 | 1 | 4 | 5 | | 1 | Eddie Nketiah | 14 | eng ENG | FW | 23-067 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 2 | 0 | 0 | 0 | 0 | 0 | 0.1 | 2 | 0 | 4 | 4 | 100 | 1 | 4 | 1 | 0 | 0 | | 2 | Martinelli | 11 | br BRA | LW | 21-048 | 90 | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 38 | 21 | 0 | 2 | 1 | 0.6 | 0.6 | 0.1 | 1 | 0 | 24 | 28 | 85.7 | 1 | 34 | 5 | 3 | 4 | | 3 | Bukayo Saka | 7 | eng ENG | RW | 20-334 | 90 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 52 | 23 | 3 | 0 | 3 | 0.2 | 0.2 | 0 | 2 | 1 | 24 | 36 | 66.7 | 2 | 37 | 8 | 2 | 2 | | 4 | Martin Ødegaard | 8 | no NOR | AM | 23-231 | 89 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 50 | 22 | 2 | 1 | 2 | 0.1 | 0.1 | 0 | 2 | 0 | 30 | 39 | 76.9 | 5 | 28 | 3 | 1 | 2 | | 5 | Albert Sambi Lokonga | 23 | be BEL | CM | 22-287 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 100 | 0 | 1 | 1 | 0 | 0 | | 6 | Granit Xhaka | 34 | ch SUI | DM | 29-312 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 60 | 5 | 0 | 2 | 3 | 0 | 0 | 0 | 4 | 0 | 42 | 49 | 85.7 | 6 | 32 | 2 | 0 | 0 | | 7 | Thomas Partey | 5 | gh GHA | DM | 29-053 | 90 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 62 | 25 | 7 | 1 | 2 | 0.1 | 0.1 | 0 | 0 | 0 | 40 | 47 | 85.1 | 5 | 26 | 4 | 0 | 1 | | 8 | Oleksandr Zinchenko | 35 | ua UKR | LB | 25-233 | 82 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 64 | 16 | 3 | 3 | 1 | 0 | 0 | 0.3 | 2 | 1 | 44 | 54 | 81.5 | 6 | 36 | 5 | 0 | 0 | | 9 | Kieran Tierney | 3 | sct SCO | LB,WB | 25-061 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 50 | 0 | 1 | 0 | 0 | 0 | | 10 | Gabriel Dos Santos | 6 | br BRA | CB | 24-229 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 67 | 5 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 52 | 58 | 89.7 | 1 | 48 | 3 | 0 | 0 | | 11 | William Saliba | 12 | fr FRA | CB | 21-134 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 58 | 3 | 1 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 42 | 46 | 91.3 | 1 | 35 | 1 | 0 | 0 | | 12 | Ben White | 4 | eng ENG | RB | 24-301 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 61 | 22 | 7 | 4 | 5 | 0 | 0 | 0.1 | 1 | 0 | 29 | 40 | 72.5 | 5 | 25 | 2 | 1 | 1 | | 13 | Aaron Ramsdale | 1 | eng ENG | GK | 24-083 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 32 | 75 | 0 | 21 | 0 | 0 | 0 | | 14 | 14 Players | nan | nan | nan | nan | 990 | 1 | 1 | 0 | 0 | 10 | 2 | 2 | 0 | 599 | 158 | 25 | 17 | 21 | 1.1 | 1.1 | 0.5 | 18 | 2 | 378 | 465 | 81.3 | 35 | 361 | 36 | 11 | 15 |
could you elaborate more? maybe you could split the raw text by comma and then convert it to a dataframe like: list_of_string = input.split(',') df = pd.DataFrame(list_of_string) df.to_csv('yourfile.csv')
The correct approach is as proposed by chitown88, however if you want to copy paste the data by hand into the terminal and get a csv you can do something like this: import pandas as pd from datetime import datetime while True: print("Enter/Paste your content. Ctrl-D or Ctrl-Z ( windows ) to save it.") contents = [] while True: try: line = input() except EOFError: break contents.append(line) df = pd.DataFrame(contents) df.to_csv(f"df_{int(datetime.now().timestamp())}.csv", index=None) Start the Python script, paste the data into the terminal, press CTRL+D and press enter to export the data you pasted into the terminal into a csv file.
You can use user input controlled while loop to get user input. Finally, you may exit depending on the user’s choice. Look at the code below: user_input = 'Y' while user_input.lower() == 'y': # Run your code here. user_input = input('Do you want to add one more entry: Y or N?')
This is most intuitive and understandable solution I could come up with uses of basic linear algebra to solve the problem which I find pretty neat. I recommend you to find an another way to parse the data. Check out beautifulsoup and requests. import pandas as pd#for dataframe data = ''' ,,,,,,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Expected,Expected,Expected,SCA,SCA,Passes,Passes,Passes,Passes,Carries,Carries,Dribbles,Dribbles,-additional Player,#,Nation,Pos,Age,Min,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Press,Tkl,Int,Blocks,xG,npxG,xA,SCA,GCA,Cmp,Att,Cmp%,Prog,Carries,Prog,Succ,Att,-9999 Gabriel Jesus,9,br BRA,FW,25-124,82,0,0,0,0,1,0,0,0,40,13,1,1,0,0.1,0.1,0.0,4,0,20,27,74.1,2,33,1,4,5,b66315ae Eddie Nketiah,14,eng ENG,FW,23-067,8,0,0,0,0,0,0,0,0,6,2,0,0,0,0.0,0.0,0.1,2,0,4,4,100.0,1,4,1,0,0,a53649b7 Martinelli,11,br BRA,LW,21-048,90,1,0,0,0,2,1,0,0,38,21,0,2,1,0.6,0.6,0.1,1,0,24,28,85.7,1,34,5,3,4,48a5a5d6 Bukayo Saka,7,eng ENG,RW,20-334,90,0,0,0,0,3,0,0,0,52,23,3,0,3,0.2,0.2,0.0,2,1,24,36,66.7,2,37,8,2,2,bc7dc64d Martin Ødegaard,8,no NOR,AM,23-231,89,0,0,0,0,2,0,0,0,50,22,2,1,2,0.1,0.1,0.0,2,0,30,39,76.9,5,28,3,1,2,79300479 Albert Sambi Lokonga,23,be BEL,CM,22-287,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0.0,0.0,0.0,0,0,1,1,100.0,0,1,1,0,0,1b4f1169 Granit Xhaka,34,ch SUI,DM,29-312,90,0,0,0,0,0,0,1,0,60,5,0,2,3,0.0,0.0,0.0,4,0,42,49,85.7,6,32,2,0,0,e61b8aee Thomas Partey,5,gh GHA,DM,29-053,90,0,0,0,0,1,0,0,0,62,25,7,1,2,0.1,0.1,0.0,0,0,40,47,85.1,5,26,4,0,1,529f49ab Oleksandr Zinchenko,35,ua UKR,LB,25-233,82,0,1,0,0,1,1,0,0,64,16,3,3,1,0.0,0.0,0.3,2,1,44,54,81.5,6,36,5,0,0,51cf8561 Kieran Tierney,3,sct SCO,LBWB,25-061,8,0,0,0,0,0,0,0,0,6,1,0,0,0,0.0,0.0,0.0,0,0,2,4,50.0,0,1,0,0,0,fce2302c Gabriel Dos Santos,6,br BRA,CB,24-229,90,0,0,0,0,0,0,0,0,67,5,1,1,2,0.0,0.0,0.0,0,0,52,58,89.7,1,48,3,0,0,67ac5bb8 William Saliba,12,fr FRA,CB,21-134,90,0,0,0,0,0,0,0,0,58,3,1,2,2,0.0,0.0,0.0,0,0,42,46,91.3,1,35,1,0,0,972aeb2a Ben White,4,eng ENG,RB,24-301,90,0,0,0,0,0,0,1,0,61,22,7,4,5,0.0,0.0,0.1,1,0,29,40,72.5,5,25,2,1,1,35e413f1 Aaron Ramsdale,1,eng ENG,GK,24-083,90,0,0,0,0,0,0,0,0,33,0,0,0,0,0.0,0.0,0.0,0,0,24,32,75.0,0,21,0,0,0,466fb2c5 14 Players,,,,,990,1,1,0,0,10,2,2,0,599,158,25,17,21,1.1,1.1,0.5,18,2,378,465,81.3,35,361,36,11,15,-9999 ''' #you can just replace data with user input def tryNum(x):#input a value and if its a number then it returns a number, if not it returns itself back try: x = float(x) return x except: return x rows = [i.split(',')[:-1] for i in data.split('\n')[2:-2]]#removing useless lines col_names = [i for i in rows[0]]#fetching all column names cols = [[tryNum(rows[j][i]) for j in range(1,len(rows))] for i in range(len(rows[0]))]#get all column info by transposing the "matrix" if you will full = {}#setting up the dictionary for i,y in zip(col_names,cols):#putting the data in the dict full[i]=y df = pd.DataFrame(data = full)#uploading it all to the df print(df.head())
how to find max of a columns with same name
im having some trouble with this data frame where columns having the same name have to be reduced to values with at least one "1" as "1". +---+---+---+---+---+---+---+---+---+ | a | a | a | b | c | c | c | d | d | +---+---+---+---+---+---+---+---+---+ | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | +---+---+---+---+---+---+---+---+---+ to something like this using "or" condition for every column for a huge dataset could be a time-consuming task so I am having trouble figuring it out. I used max(axis=1, level=0) still couldn't make it. my desired output : +---+---+---+---+ | a | b | c | d | +---+---+---+---+ | 1 | 1 | 1 | 1 | | 0 | 1 | 1 | 1 | | 1 | 0 | 1 | 0 | +---+---+---+---+
Check with max df = df.max(level=0, axis=1)
Filter all rows from groupby object
I have a dataframe like below +-----------+------------+---------------+------+-----+-------+ | InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price | +-----------+------------+---------------+------+-----+-------+ | 1 | 1 | 77 | 128 | 1 | 10 | | 1 | 1 | 77 | 101 | 1 | 11 | | 1 | 2 | 77 | 105 | 3 | 12 | | 1 | 3 | 77 | 129 | 2 | 10 | | 2 | 1 | 21 | 145 | 1 | 9 | | 2 | 2 | 21 | 130 | 1 | 12 | +-----------+------------+---------------+------+-----+-------+ I want to filter the entire group, if any of the items in the list item_list = [128,129,130] is present in that group, after grouping by 'InvoiceNo' &'CategoryNo'. My desired out put is as below +-----------+------------+---------------+------+-----+-------+ | InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price | +-----------+------------+---------------+------+-----+-------+ | 1 | 1 | 77 | 128 | 1 | 10 | | 1 | 1 | 77 | 101 | 1 | 11 | | 1 | 3 | 77 | 129 | 2 | 10 | | 2 | 2 | 21 | 130 | 1 | 12 | +-----------+------------+---------------+------+-----+-------+ I know how to filter a dataframe using isin(). But, not sure how to do it with groupby() so far i have tried below import pandas as pd df = pd.read_csv('data.csv') item_list = [128,129,130] df.groupby(['InvoiceNo','CategoryNo'])['Item'].isin(item_list) but nothing happens. please guide me how to solve this issue.
You can do something like this: s = (df['Item'].isin(item_list) .groupby([df['InvoiceNo'], df['CategoryNo']]) .transform('any') ) df[s]
Logical indexing in pandas dataframes [duplicate]
This question already has answers here: How do I Pandas group-by to get sum? (11 answers) Closed 3 years ago. I have some data like this: +-----------+---------+-------+ | Duration | Outcome | Event | +-----------+---------+-------+ | 421 | 0 | 1 | | 421 | 0 | 1 | | 261 | 0 | 1 | | 24 | 0 | 1 | | 27 | 0 | 1 | | 613 | 0 | 1 | | 2454 | 0 | 1 | | 227 | 0 | 1 | | 2560 | 0 | 1 | | 229 | 0 | 1 | | 2242 | 0 | 1 | | 6680 | 0 | 1 | | 1172 | 0 | 1 | | 5656 | 0 | 1 | | 5082 | 0 | 1 | | 7239 | 0 | 1 | | 127 | 0 | 1 | | 128 | 0 | 1 | | 128 | 0 | 1 | | 7569 | 1 | 1 | | 324 | 0 | 2 | | 6395 | 0 | 2 | | 6196 | 0 | 2 | | 31 | 0 | 2 | | 228 | 0 | 2 | | 274 | 0 | 2 | | 270 | 0 | 2 | | 275 | 0 | 2 | | 232 | 0 | 2 | | 7310 | 0 | 2 | | 7644 | 1 | 2 | | 6949 | 0 | 3 | | 6903 | 1 | 3 | | 6942 | 0 | 4 | | 7031 | 1 | 4 | +-----------+---------+-------+ Now, for each Event, with the Outcome 0/1 considered as Fail/Pass, I want to sum the total Duration of Fail/Pass events separately in 2 new columns (or 1, whatever ensures readability). I'm new to dataframes and I feel significant logical indexing is involved here. What is the best way to approach this problem?
df.groupby(['Event', 'Outcome'])['Duration'].sum() So you group by both the event then the outcome, look at the duration column then take the sum of each group.
You can also try: pd.pivot_table(index='Event', columns='Outcome', values='Duration', data=df, aggfunc='sum') which gives you a table with two columns: +---------+-------+------+ | Outcome | 0 | 1 | +---------+-------+------+ | Event | | | +---------+-------+------+ | 1 | 35691 | 7569 | | 2 | 21535 | 7644 | | 3 | 6949 | 6903 | | 4 | 6942 | 7031 | +---------+-------+------+
Use the other columns value if a condition is met Panda
Assuming I have the following table: +----+---+---+ | A | B | C | +----+---+---+ | 1 | 1 | 3 | | 2 | 2 | 7 | | 6 | 3 | 2 | | -1 | 9 | 0 | | 2 | 1 | 3 | | -8 | 8 | 2 | | 2 | 1 | 9 | +----+---+---+ if column A's value is Negative, update column B's value by the value of column C. if not do nothing This is the desired output: +----+---+---+ | A | B | C | +----+---+---+ | 1 | 1 | 3 | | 2 | 2 | 7 | | 6 | 3 | 2 | | -1 | 0 | 0 | | 2 | 1 | 3 | | -8 | 2 | 2 | | 2 | 1 | 9 | +----+---+---+ I've been trying the following code but it's not working #not working result.loc(result["A"] < 0,result['B'] = result['C'].iloc[0])
result.B[result.A < 0] = result.C
Try this: df.loc[df['A'] < 0, 'B'] = df['C']