Filter all rows from groupby object - python

I have a dataframe like below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 2 | 77 | 105 | 3 | 12 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 1 | 21 | 145 | 1 | 9 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I want to filter the entire group, if any of the items in the list item_list = [128,129,130] is present in that group, after grouping by 'InvoiceNo' &'CategoryNo'.
My desired out put is as below
+-----------+------------+---------------+------+-----+-------+
| InvoiceNo | CategoryNo | Invoice Value | Item | Qty | Price |
+-----------+------------+---------------+------+-----+-------+
| 1 | 1 | 77 | 128 | 1 | 10 |
| 1 | 1 | 77 | 101 | 1 | 11 |
| 1 | 3 | 77 | 129 | 2 | 10 |
| 2 | 2 | 21 | 130 | 1 | 12 |
+-----------+------------+---------------+------+-----+-------+
I know how to filter a dataframe using isin(). But, not sure how to do it with groupby()
so far i have tried below
import pandas as pd
df = pd.read_csv('data.csv')
item_list = [128,129,130]
df.groupby(['InvoiceNo','CategoryNo'])['Item'].isin(item_list)
but nothing happens. please guide me how to solve this issue.

You can do something like this:
s = (df['Item'].isin(item_list)
.groupby([df['InvoiceNo'], df['CategoryNo']])
.transform('any')
)
df[s]

Related

Is there a method in turning user input into csv format?

This is the example data that would be pasted into an input() prompt and ideally I would like it to be processed and made into a csv file through python:
,,,,,,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Expected,Expected,Expected,SCA,SCA,Passes,Passes,Passes,Passes,Carries,Carries,Dribbles,Dribbles,-additional
Player,#,Nation,Pos,Age,Min,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Press,Tkl,Int,Blocks,xG,npxG,xA,SCA,GCA,Cmp,Att,Cmp%,Prog,Carries,Prog,Succ,Att,-9999
Gabriel Jesus,9,br BRA,FW,25-124,82,0,0,0,0,1,0,0,0,40,13,1,1,0,0.1,0.1,0.0,4,0,20,27,74.1,2,33,1,4,5,b66315ae
Eddie Nketiah,14,eng ENG,FW,23-067,8,0,0,0,0,0,0,0,0,6,2,0,0,0,0.0,0.0,0.1,2,0,4,4,100.0,1,4,1,0,0,a53649b7
Martinelli,11,br BRA,LW,21-048,90,1,0,0,0,2,1,0,0,38,21,0,2,1,0.6,0.6,0.1,1,0,24,28,85.7,1,34,5,3,4,48a5a5d6
Bukayo Saka,7,eng ENG,RW,20-334,90,0,0,0,0,3,0,0,0,52,23,3,0,3,0.2,0.2,0.0,2,1,24,36,66.7,2,37,8,2,2,bc7dc64d
Martin Ødegaard,8,no NOR,AM,23-231,89,0,0,0,0,2,0,0,0,50,22,2,1,2,0.1,0.1,0.0,2,0,30,39,76.9,5,28,3,1,2,79300479
Albert Sambi Lokonga,23,be BEL,CM,22-287,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0.0,0.0,0.0,0,0,1,1,100.0,0,1,1,0,0,1b4f1169
Granit Xhaka,34,ch SUI,DM,29-312,90,0,0,0,0,0,0,1,0,60,5,0,2,3,0.0,0.0,0.0,4,0,42,49,85.7,6,32,2,0,0,e61b8aee
Thomas Partey,5,gh GHA,DM,29-053,90,0,0,0,0,1,0,0,0,62,25,7,1,2,0.1,0.1,0.0,0,0,40,47,85.1,5,26,4,0,1,529f49ab
Oleksandr Zinchenko,35,ua UKR,LB,25-233,82,0,1,0,0,1,1,0,0,64,16,3,3,1,0.0,0.0,0.3,2,1,44,54,81.5,6,36,5,0,0,51cf8561
Kieran Tierney,3,sct SCO,LBWB,25-061,8,0,0,0,0,0,0,0,0,6,1,0,0,0,0.0,0.0,0.0,0,0,2,4,50.0,0,1,0,0,0,fce2302c
Gabriel Dos Santos,6,br BRA,CB,24-229,90,0,0,0,0,0,0,0,0,67,5,1,1,2,0.0,0.0,0.0,0,0,52,58,89.7,1,48,3,0,0,67ac5bb8
William Saliba,12,fr FRA,CB,21-134,90,0,0,0,0,0,0,0,0,58,3,1,2,2,0.0,0.0,0.0,0,0,42,46,91.3,1,35,1,0,0,972aeb2a
Ben White,4,eng ENG,RB,24-301,90,0,0,0,0,0,0,1,0,61,22,7,4,5,0.0,0.0,0.1,1,0,29,40,72.5,5,25,2,1,1,35e413f1
Aaron Ramsdale,1,eng ENG,GK,24-083,90,0,0,0,0,0,0,0,0,33,0,0,0,0,0.0,0.0,0.0,0,0,24,32,75.0,0,21,0,0,0,466fb2c5
14 Players,,,,,990,1,1,0,0,10,2,2,0,599,158,25,17,21,1.1,1.1,0.5,18,2,378,465,81.3,35,361,36,11,15,-9999
The link to the table is: https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League#stats_18bb7c10_summary
I have attempted to use pandas dataframe but I am only able to export the first row of headers and nothing else (only the items before player).
Would have been nice for you to include your attempt.
Pandas works just fine:
import pandas as pd
url = 'https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League#stats_18bb7c10_summary'
df = pd.read_html(url)[10]
cols = [f'{each[0]}_{each[1]}' if 'Unnamed' not in each[0] else f'{each[1]}' for each in df.columns]
df.columns = cols
df.to_csv('output.csv', index=False)
Output:
print(df.to_markdown())
| | Player | # | Nation | Pos | Age | Min | Gls | Ast | PK | PKatt | Sh | SoT | CrdY | CrdR | Touches | Press | Tkl | Int | Blocks | xG | npxG | xA | SCA | GCA | Cmp | Att | Cmp% | Prog | Carries | Prog.1 | Succ | Att.1 |
|---:|:---------------------|----:|:---------|:------|:-------|------:|------:|------:|-----:|--------:|-----:|------:|-------:|-------:|----------:|--------:|------:|------:|---------:|-----:|-------:|-----:|------:|------:|------:|------:|-------:|-------:|----------:|---------:|-------:|--------:|
| 0 | Gabriel Jesus | 9 | br BRA | FW | 25-124 | 82 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 40 | 13 | 1 | 1 | 0 | 0.1 | 0.1 | 0 | 4 | 0 | 20 | 27 | 74.1 | 2 | 33 | 1 | 4 | 5 |
| 1 | Eddie Nketiah | 14 | eng ENG | FW | 23-067 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 2 | 0 | 0 | 0 | 0 | 0 | 0.1 | 2 | 0 | 4 | 4 | 100 | 1 | 4 | 1 | 0 | 0 |
| 2 | Martinelli | 11 | br BRA | LW | 21-048 | 90 | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 38 | 21 | 0 | 2 | 1 | 0.6 | 0.6 | 0.1 | 1 | 0 | 24 | 28 | 85.7 | 1 | 34 | 5 | 3 | 4 |
| 3 | Bukayo Saka | 7 | eng ENG | RW | 20-334 | 90 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 52 | 23 | 3 | 0 | 3 | 0.2 | 0.2 | 0 | 2 | 1 | 24 | 36 | 66.7 | 2 | 37 | 8 | 2 | 2 |
| 4 | Martin Ødegaard | 8 | no NOR | AM | 23-231 | 89 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 50 | 22 | 2 | 1 | 2 | 0.1 | 0.1 | 0 | 2 | 0 | 30 | 39 | 76.9 | 5 | 28 | 3 | 1 | 2 |
| 5 | Albert Sambi Lokonga | 23 | be BEL | CM | 22-287 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 100 | 0 | 1 | 1 | 0 | 0 |
| 6 | Granit Xhaka | 34 | ch SUI | DM | 29-312 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 60 | 5 | 0 | 2 | 3 | 0 | 0 | 0 | 4 | 0 | 42 | 49 | 85.7 | 6 | 32 | 2 | 0 | 0 |
| 7 | Thomas Partey | 5 | gh GHA | DM | 29-053 | 90 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 62 | 25 | 7 | 1 | 2 | 0.1 | 0.1 | 0 | 0 | 0 | 40 | 47 | 85.1 | 5 | 26 | 4 | 0 | 1 |
| 8 | Oleksandr Zinchenko | 35 | ua UKR | LB | 25-233 | 82 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 64 | 16 | 3 | 3 | 1 | 0 | 0 | 0.3 | 2 | 1 | 44 | 54 | 81.5 | 6 | 36 | 5 | 0 | 0 |
| 9 | Kieran Tierney | 3 | sct SCO | LB,WB | 25-061 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 50 | 0 | 1 | 0 | 0 | 0 |
| 10 | Gabriel Dos Santos | 6 | br BRA | CB | 24-229 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 67 | 5 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 52 | 58 | 89.7 | 1 | 48 | 3 | 0 | 0 |
| 11 | William Saliba | 12 | fr FRA | CB | 21-134 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 58 | 3 | 1 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 42 | 46 | 91.3 | 1 | 35 | 1 | 0 | 0 |
| 12 | Ben White | 4 | eng ENG | RB | 24-301 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 61 | 22 | 7 | 4 | 5 | 0 | 0 | 0.1 | 1 | 0 | 29 | 40 | 72.5 | 5 | 25 | 2 | 1 | 1 |
| 13 | Aaron Ramsdale | 1 | eng ENG | GK | 24-083 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 32 | 75 | 0 | 21 | 0 | 0 | 0 |
| 14 | 14 Players | nan | nan | nan | nan | 990 | 1 | 1 | 0 | 0 | 10 | 2 | 2 | 0 | 599 | 158 | 25 | 17 | 21 | 1.1 | 1.1 | 0.5 | 18 | 2 | 378 | 465 | 81.3 | 35 | 361 | 36 | 11 | 15 |
could you elaborate more?
maybe you could split the raw text by comma and then convert it to a dataframe
like:
list_of_string = input.split(',')
df = pd.DataFrame(list_of_string)
df.to_csv('yourfile.csv')
The correct approach is as proposed by chitown88, however if you want to copy paste the data by hand into the terminal and get a csv you can do something like this:
import pandas as pd
from datetime import datetime
while True:
print("Enter/Paste your content. Ctrl-D or Ctrl-Z ( windows ) to save it.")
contents = []
while True:
try:
line = input()
except EOFError:
break
contents.append(line)
df = pd.DataFrame(contents)
df.to_csv(f"df_{int(datetime.now().timestamp())}.csv", index=None)
Start the Python script, paste the data into the terminal, press CTRL+D and press enter to export the data you pasted into the terminal into a csv file.
You can use user input controlled while loop to get user input. Finally, you may exit depending on the user’s choice. Look at the code below:
user_input = 'Y'
while user_input.lower() == 'y':
# Run your code here.
user_input = input('Do you want to add one more entry: Y or N?')
This is most intuitive and understandable solution I could come up with uses of basic linear algebra to solve the problem which I find pretty neat. I recommend you to find an another way to parse the data. Check out beautifulsoup and requests.
import pandas as pd#for dataframe
data = '''
,,,,,,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Expected,Expected,Expected,SCA,SCA,Passes,Passes,Passes,Passes,Carries,Carries,Dribbles,Dribbles,-additional
Player,#,Nation,Pos,Age,Min,Gls,Ast,PK,PKatt,Sh,SoT,CrdY,CrdR,Touches,Press,Tkl,Int,Blocks,xG,npxG,xA,SCA,GCA,Cmp,Att,Cmp%,Prog,Carries,Prog,Succ,Att,-9999
Gabriel Jesus,9,br BRA,FW,25-124,82,0,0,0,0,1,0,0,0,40,13,1,1,0,0.1,0.1,0.0,4,0,20,27,74.1,2,33,1,4,5,b66315ae
Eddie Nketiah,14,eng ENG,FW,23-067,8,0,0,0,0,0,0,0,0,6,2,0,0,0,0.0,0.0,0.1,2,0,4,4,100.0,1,4,1,0,0,a53649b7
Martinelli,11,br BRA,LW,21-048,90,1,0,0,0,2,1,0,0,38,21,0,2,1,0.6,0.6,0.1,1,0,24,28,85.7,1,34,5,3,4,48a5a5d6
Bukayo Saka,7,eng ENG,RW,20-334,90,0,0,0,0,3,0,0,0,52,23,3,0,3,0.2,0.2,0.0,2,1,24,36,66.7,2,37,8,2,2,bc7dc64d
Martin Ødegaard,8,no NOR,AM,23-231,89,0,0,0,0,2,0,0,0,50,22,2,1,2,0.1,0.1,0.0,2,0,30,39,76.9,5,28,3,1,2,79300479
Albert Sambi Lokonga,23,be BEL,CM,22-287,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0.0,0.0,0.0,0,0,1,1,100.0,0,1,1,0,0,1b4f1169
Granit Xhaka,34,ch SUI,DM,29-312,90,0,0,0,0,0,0,1,0,60,5,0,2,3,0.0,0.0,0.0,4,0,42,49,85.7,6,32,2,0,0,e61b8aee
Thomas Partey,5,gh GHA,DM,29-053,90,0,0,0,0,1,0,0,0,62,25,7,1,2,0.1,0.1,0.0,0,0,40,47,85.1,5,26,4,0,1,529f49ab
Oleksandr Zinchenko,35,ua UKR,LB,25-233,82,0,1,0,0,1,1,0,0,64,16,3,3,1,0.0,0.0,0.3,2,1,44,54,81.5,6,36,5,0,0,51cf8561
Kieran Tierney,3,sct SCO,LBWB,25-061,8,0,0,0,0,0,0,0,0,6,1,0,0,0,0.0,0.0,0.0,0,0,2,4,50.0,0,1,0,0,0,fce2302c
Gabriel Dos Santos,6,br BRA,CB,24-229,90,0,0,0,0,0,0,0,0,67,5,1,1,2,0.0,0.0,0.0,0,0,52,58,89.7,1,48,3,0,0,67ac5bb8
William Saliba,12,fr FRA,CB,21-134,90,0,0,0,0,0,0,0,0,58,3,1,2,2,0.0,0.0,0.0,0,0,42,46,91.3,1,35,1,0,0,972aeb2a
Ben White,4,eng ENG,RB,24-301,90,0,0,0,0,0,0,1,0,61,22,7,4,5,0.0,0.0,0.1,1,0,29,40,72.5,5,25,2,1,1,35e413f1
Aaron Ramsdale,1,eng ENG,GK,24-083,90,0,0,0,0,0,0,0,0,33,0,0,0,0,0.0,0.0,0.0,0,0,24,32,75.0,0,21,0,0,0,466fb2c5
14 Players,,,,,990,1,1,0,0,10,2,2,0,599,158,25,17,21,1.1,1.1,0.5,18,2,378,465,81.3,35,361,36,11,15,-9999
'''
#you can just replace data with user input
def tryNum(x):#input a value and if its a number then it returns a number, if not it returns itself back
try:
x = float(x)
return x
except:
return x
rows = [i.split(',')[:-1] for i in data.split('\n')[2:-2]]#removing useless lines
col_names = [i for i in rows[0]]#fetching all column names
cols = [[tryNum(rows[j][i]) for j in range(1,len(rows))] for i in range(len(rows[0]))]#get all column info by transposing the "matrix" if you will
full = {}#setting up the dictionary
for i,y in zip(col_names,cols):#putting the data in the dict
full[i]=y
df = pd.DataFrame(data = full)#uploading it all to the df
print(df.head())

How to get the column values of a Dataframe into another dataframe as a new column after matching the values in columns that both dataframes have?

I'm trying to create a new column in a DataFrame and storing it with values stored in a different dataframe by first comparing the values of columns that both dataframes have. For example:
df1 >>>
| name | team | week | dates | interceptions | pass_yds | rating |
| ---- | ---- | -----| ---------- | ------------- | --------- | -------- |
| maho | KC | 1 | 2020-09-10 | 0 | 300 | 105 |
| went | PHI | 1 | 2020-09-13 | 2 | 225 | 74 |
| lock | DEN | 1 | 2020-09-14 | 0 | 150 | 89 |
| dris | DEN | 2 | 2020-09-20 | 1 | 220 | 95 |
| went | PHI | 2 | 2020-09-20 | 2 | 250 | 64 |
| maho | KC | 2 | 2020-09-21 | 1 | 245 | 101 |
df2 >>>
| name | team | week | catches | rec_yds | rec_tds |
| ---- | ---- | -----| ------- | ------- | ------- |
| ertz | PHI | 1 | 5 | 58 | 1 |
| fant | DEN | 2 | 6 | 79 | 0 |
| kelc | KC | 2 | 8 | 105 | 1 |
| fant | DEN | 1 | 3 | 29 | 0 |
| kelc | KC | 1 | 6 | 71 | 1 |
| ertz | PHI | 2 | 7 | 91 | 2 |
| goed | PHI | 2 | 2 | 15 | 0 |
I want to create a dates column in df2 with the values of the dates stored in the dates column in df1 after matching the teams and the weeks columns. After the matching, df2 in this example should look something like this:
df2 >>>
| name | team | week | catches | rec_yds | rec_tds | dates |
| ---- | ---- | -----| ------- | ------- | ------- | ---------- |
| ertz | PHI | 1 | 5 | 58 | 1 | 2020-09-13 |
| fant | DEN | 2 | 6 | 79 | 0 | 2020-09-20 |
| kelc | KC | 2 | 8 | 105 | 1 | 2020-09-20 |
| fant | DEN | 1 | 3 | 29 | 0 | 2020-09-14 |
| kelc | KC | 1 | 6 | 71 | 1 | 2020-09-10 |
| ertz | PHI | 2 | 7 | 91 | 2 | 2020-09-20 |
| goed | PHI | 2 | 2 | 15 | 0 | 2020-09-20 |
I'm looking for an optimal solution. I've already tried nested for loops and comparing the week and team columns from both dataframes together but that hasn't worked. At this point I'm all out of ideas. Please help!
Disclaimer: The actual DataFrames I'm working with are a lot larger. They have a lot more rows, columns, and values (i.e. a lot more teams in the team columns, a lot more dates in the dates columns, and a lot more weeks in the week columns)

Logical indexing in pandas dataframes [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 3 years ago.
I have some data like this:
+-----------+---------+-------+
| Duration | Outcome | Event |
+-----------+---------+-------+
| 421 | 0 | 1 |
| 421 | 0 | 1 |
| 261 | 0 | 1 |
| 24 | 0 | 1 |
| 27 | 0 | 1 |
| 613 | 0 | 1 |
| 2454 | 0 | 1 |
| 227 | 0 | 1 |
| 2560 | 0 | 1 |
| 229 | 0 | 1 |
| 2242 | 0 | 1 |
| 6680 | 0 | 1 |
| 1172 | 0 | 1 |
| 5656 | 0 | 1 |
| 5082 | 0 | 1 |
| 7239 | 0 | 1 |
| 127 | 0 | 1 |
| 128 | 0 | 1 |
| 128 | 0 | 1 |
| 7569 | 1 | 1 |
| 324 | 0 | 2 |
| 6395 | 0 | 2 |
| 6196 | 0 | 2 |
| 31 | 0 | 2 |
| 228 | 0 | 2 |
| 274 | 0 | 2 |
| 270 | 0 | 2 |
| 275 | 0 | 2 |
| 232 | 0 | 2 |
| 7310 | 0 | 2 |
| 7644 | 1 | 2 |
| 6949 | 0 | 3 |
| 6903 | 1 | 3 |
| 6942 | 0 | 4 |
| 7031 | 1 | 4 |
+-----------+---------+-------+
Now, for each Event, with the Outcome 0/1 considered as Fail/Pass, I want to sum the total Duration of Fail/Pass events separately in 2 new columns (or 1, whatever ensures readability).
I'm new to dataframes and I feel significant logical indexing is involved here. What is the best way to approach this problem?
df.groupby(['Event', 'Outcome'])['Duration'].sum()
So you group by both the event then the outcome, look at the duration column then take the sum of each group.
You can also try:
pd.pivot_table(index='Event',
columns='Outcome',
values='Duration',
data=df,
aggfunc='sum')
which gives you a table with two columns:
+---------+-------+------+
| Outcome | 0 | 1 |
+---------+-------+------+
| Event | | |
+---------+-------+------+
| 1 | 35691 | 7569 |
| 2 | 21535 | 7644 |
| 3 | 6949 | 6903 |
| 4 | 6942 | 7031 |
+---------+-------+------+

where condition is met, get last row pandas

+------------+----+----+----+-----+----+----+----+-----+
| WS | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 |
+------------+----+----+----+-----+----+----+----+-----+
| w1 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 50 |
| w2 | 0 | 30 | 0 | 0 | 0 | 30 | 0 | 0 |
| d1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| d2 | 62 | 0 | 0 | 0 | 62 | 0 | 0 | 0 |
| Total | 62 | 30 | 0 | 50 | 62 | 30 | 0 | 50 |
| Cumulative | 62 | 92 | 92 | 142 | 62 | 92 | 92 | 142 |
+------------+----+----+----+-----+----+----+----+-----+
Based on the condition of the column having value more than 0, I would like to get the corresponding value of row "Cumulative".
As shown in the image, when 50 > 0, I would like to get the corresponding "Cumulative" value of 142.
+------------+----+----+---+-----+----+----+---+-----+
| WS | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 |
+------------+----+----+---+-----+----+----+---+-----+
| Cumulative | 62 | 92 | 0 | 142 | 62 | 92 | 0 | 142 |
+------------+----+----+---+-----+----+----+---+-----+
I have tried pandas loc and iloc but they cannot perform what I wanted.
Thank you in advanced!
You definitely should have designed your question better, but anyway, here is a possible solution:
Assuming df is your DataFrame:
df[df>5].loc['Cumulative'].dropna()

Parsing out indeces and values from pandas multi index dataframe

I have a dataframe in a similar format to this:
+--------+--------+----------+------+------+------+------+
| | | | | day1 | day2 | day3 |
+--------+--------+----------+------+------+------+------+
| id_one | id_two | id_three | date | | | |
| 18273 | 50 | 1 | 3 | 9 | 11 | 3 |
| | | | 4 | 26 | 27 | 68 |
| | | | 5 | 92 | 25 | 4 |
| | | | 6 | 60 | 72 | 83 |
| | 60 | 2 | 5 | 69 | 93 | 84 |
| | | | 6 | 69 | 30 | 12 |
| | | | 7 | 65 | 65 | 59 |
| | | | 8 | 57 | 88 | 59 |
| | 70 | 3 | 5 | 22 | 95 | 7 |
| | | | 6 | 40 | 24 | 20 |
| | | | 7 | 73 | 81 | 57 |
| | | | 8 | 43 | 8 | 66 |
+--------+--------+----------+------+------+------+------+
I am trying to create tuple that contains id_one, id_two and the values that each grouping contains.
To test this, I am simply trying to print the ids and values like this:
for id_two, data in df.head(100).groupby(level='id_two'):
print id_two, data.values.ravel()
Which gives me the id_two and the data exactly as it should.
I am running into problems when I try and incorporate id_one. I tried this, but was met with an error ValueError: need more than 2 values to unpack
for id_one, id_two, data in df.head(100).groupby(level='id_two'):
print id_one, id_two, data.values.ravel()
How can I print id_one, id_two and the data?
You can pass a list of columns into the level parameter:
df.head.groupby(level=['id_one', 'id_two'])

Categories