list index is out of range - cannot find why - python

I'm writing code to sort data into lists and then arrange these lists, but I keep getting:
builtins.IndexError: list index out of range
The data in the code is:
Class 1:
Matthew 5 5 1
Paul 9 3 6
Sara 9 4 2
Nicholas3 2 4
Larry 5 2 5
Philip 4 4 6
Patricia7 7 7
Gary 0 9 4
Marie 8 7 1
Scott 4 3 2
Class 2:
Heather 10 2 3
Lawrence 2 4 4
Stephen 1 6 8
Robert 3 5 4
Shawn 9 6 6
Michelle 6 7 4
Chris 10 4 2
Teresa 5 5 6
Dennis 7 8 1
Rose 8 3 4
Class 3:
Jojo 10 3 9
Sarah 1 2 8
Jerry 5 3 7
Aaron 7 8 5
Carl 3 7 5
Christine 9 4 4
Jennifer 2 8 2
Linda 8 8 1
Justin 4 6 3
Emily 2 4 7
The code:
import csv
while True:
ClassNumber = int(input("What class would you like to see the results for?"))
if 1<= ClassNumber <=3:
break
print("That class does not exist. Please choose from 1,2 or 3")
if ClassNumber == 1:
ClassFile = "class1scores.csv"
elif ClassNumber == 2:
ClassFile = "class2scores.csv"
else:
ClassFile = "class3scores.csv"
OpenFile = open(ClassFile, "r")
scores = csv.reader(OpenFile)
newlist = []
for row in scores:
row[1] = int(row[1])
row[2] = int(row[2])
row[3] = int(row[3])
HighestScore = max(row[0:3])
row.append(HighestScore)
AverageScore = round(sum(row[0:3])/3)
row.append(AverageScore)

By default, csv.reader assumes that a comma is the field delimiter, but your data is tab-delimited. As a result, each row has only one item, not four.
scores = csv.reader(OpenFile, delimiter="\t")

Related

Counting distinct, until a certain condition based on another row is met

I have the following df
Original df
Step | CampaignSource | UserId
1 Banana Jeff
1 Banana John
2 Banana Jefferson
3 Website Nunes
4 Banana Jeff
5 Attendance Nunes
6 Attendance Antonio
7 Banana Antonio
8 Website Joseph
9 Attendance Joseph
9 Attendance Joseph
Desired output
Steps | CampaignSource | CountedDistinctUserid
1 Website 2 (Because of different userids)
2 Banana 1
3 Banana 1
4 Website 1
5 Banana 1
6 Attendance 1
7 Attendance 1
8 Attendance 1
9 Attendance 1 (but i want to have 2 here even tho they have similar user ids and because is the 9th step)
What i want to do is impose a condition where if the step column which is made by strings equals '9', i want to count the userids as non distinct, any ideas on how i could do that? I tried applying a function but i just couldnt make it work.
What i am currently doing:
df[['Steps','UserId','CampaignSource']].groupby(['Steps','CampaignSource'],as_index=False,dropna=False).nunique()
You can group by "Step" and use a condition on the group name:
df.groupby('Step')['UserId'].apply(lambda g: g.nunique() if g.name<9 else g.count())
output:
Step
1 2
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
Name: UserId, dtype: int64
As DataFrame:
(df.groupby('Step', as_index=False)
.agg(CampaignSource=('CampaignSource', 'first'),
CountedDistinctUserid=('CampaignSource', lambda g: g.nunique() if g.name<9 else g.count())
)
)
output:
Step CampaignSource CountedDistinctUserid
0 1 Banana 2
1 2 Banana 1
2 3 Website 1
3 4 Banana 1
4 5 Attendance 1
5 6 Attendance 1
6 7 Banana 1
7 8 Website 1
8 9 Banana 2
You can apply different functions to different groups depending if condition match.
out = (df[['Steps','UserId','CampaignSource']]
.groupby(['Steps','CampaignSource'],as_index=False,dropna=False)
.apply(lambda g: g.assign(CountedDistinctUserid=( [len(g)]*len(g)
if g['Steps'].eq(9).all()
else [g['UserId'].nunique()]*len(g) ))))
print(out)
Steps UserId CampaignSource CountedDistinctUserid
0 1 Jeff Banana 2
1 1 John Banana 2
2 2 Jefferson Banana 1
3 3 Nunes Website 1
4 4 Jeff Banana 1
5 5 Nunes Attendance 1
6 6 Antonio Attendance 1
7 7 Antonio Banana 1
8 8 Joseph Website 1
9 9 Joseph Attendance 2
10 9 Joseph Attendance 2

How do I give an error message if condition is not met?

Im doing an assignment for a basic programming course.
I have a dataframe (csv-file) containing the columns:
StudentID Name Assignment1 Assignment2 Assignment3
0 s123456 Michael Andersen 7 7 4
1 s123789 Bettina Petersen 12 10 10
2 s123468 Thomas Nielsen -3 7 2
3 s123579 Marie Hansen 10 12 12
4 s123579 Marie Hansen 10 12 12
5 s127848 Andreas Nielsen 2 2 2
6 s120799 Mads Westergaard 12 12 10
7 s123456 Michael Andersen 7 7 4
8 S184507 Andreas Døssing Mortensen 2 2 4
9 S129834 Jonas Jonassen 0 -3 4
10 S123481 Milad Mohammed 12 10 7
11 S128310 Abdul Jihad 10 4 7
12 S125493 Søren Sørensen 0 7 7
13 S128363 123 4 7 10
14 S127463 Jensen Jensen 5 2 10
15 S120987 Jeff Bezos 12 12 12
I need to make my program give an error message if a condition is not meet. In this instance if a student is in the dataframe more than once and if the grade given for an assignment is not on the scale of grades (-3, 0, 2, 4, 7, 10, 12):
The assignment is as follows:
If the user chooses to check for data errors, you must display a report of errors (if any) in the loaded data file. Your program must at least detect and display information about the following possible errors:
1. If two students in the data have the same student id.
2. If a grade in the data set is not one of the possible grades on the 7-step-scale.
How can I get around this?
I have tried to solve it like this, but no luck:
doubles = dataDuplicate["Name"].duplicated()
print(doubles)
grades = np.array([-3,0,2,4,7,10,12])
dataSortGrades = dataSortGrades.iloc[:,2:] #this gives
gradesNotInList = np.isin(dataSortGrades,grades)
if dataDuplicate["Name"] in doubles == True:
print("Error")
else:
print(#list of false values")
The standard approach is:
if condition:
print(message)
where condition (or not condition) and message should be adjusted to your specific needs.
You dont need to create 3 data frames. You can just create the dataframe and then perform you selections based on your conditions.
import pandas as pd
import re
data = """0 s123456 Michael Andersen 7 7 4
1 s123789 Bettina Petersen 12 10 10
2 s123468 Thomas Nielsen -3 7 2
3 s123579 Marie Hansen 10 12 12
4 s123579 Marie Hansen 10 12 12
5 s127848 Andreas Nielsen 2 2 2
6 s120799 Mads Westergaard 12 12 10
7 s123456 Michael Andersen 7 7 4
8 S184507 Andreas Døssing Mortensen 2 2 4
9 S129834 Jonas Jonassen 0 -3 4
10 S123481 Milad Mohammed 12 10 7
11 S128310 Abdul Jihad 10 4 7
12 S125493 Søren Sørensen 0 7 7
13 S128363 123 4 7 10
14 S127463 Jensen Jensen 5 2 10
15 S120987 Jeff Bezos 12 12 12"""
#Make the data frame
data = [re.split(r"\s{2,}", line)[1:] for line in data.splitlines()]
df = pd.DataFrame(data, columns=['StudentID', 'Name', 'Assignment1', 'Assignment2', 'Assignment3'])
#print the duplicates
print(f'###Duplicate studentIDs###')
print(df[df['StudentID'].duplicated()])
#print invalid grades
valid_grades = ('-3', '0', '2', '4', '7', '10', '12')
print(f'###Invalid grades###')
print(df[
(df['Assignment1'].isin(valid_grades) == False) |
(df['Assignment2'].isin(valid_grades) == False) |
(df['Assignment3'].isin(valid_grades) == False)
])
OUTPUT
###Duplicate studentIDs###
StudentID Name Assignment1 Assignment2 Assignment3
4 s123579 Marie Hansen 10 12 12
7 s123456 Michael Andersen 7 7 4
###Invalid grades###
StudentID Name Assignment1 Assignment2 Assignment3
14 S127463 Jensen Jensen 5 2 10

Write to file .txt

This is how the output should look like in the text file
Ray Holt
5 5 0 0 100 15
Jessica Jones
12 0 6 6 50 6
Johnny Rose
6 2 0 4 20 10
Gina Linetti
7 4 0 3 300 15
The number is the result from the game that I have to create.
My question is, how can I write to the text file with both string and integer result ?
I have tried this
def write_to_file(filename, player_list):
output = open(filename, "w")
for player in player_list:
output.write(str(player))
but the output is
Ray Holt 5 5 0 0 100 15Jessica Jones 12 0 6 6 50 6Johnny Rose 6 2 0 4 20 10Gina Linetti 7 4 0 3 300 15Khang 0 0 0 0 100 0
They are in 1 line
Please help me!
Thanks a lot guys
Use this:
def write_to_file(filename, player_list):
output = open(filename, "w")
for player in player_list:
output.write(str(player)+'\n')
output.close()

sum the values of a group by object

I'm having trouble with some pandas groupby object issue, which is the following:
so I have this dataframe:
Letter name num_exercises
A carl 1
A Lenna 2
A Harry 3
A Joe 4
B Carl 5
B Lenna 3
B Harry 3
B Joe 6
C Carl 6
C Lenna 3
C Harry 4
C Joe 7
And I want to add a column on it, called num_exercises_total , which contains the total sum of num_exercises for each letter. Please note that this value must be repeated for each row in the letter group.
The output would be as follows:
Letter name num_exercises num_exercises_total
A carl 1 15
A Lenna 2 15
A Harry 3 15
A Joe 4 15
B Carl 5 18
B Lenna 3 18
B Harry 3 18
B Joe 6 18
C Carl 6 20
C Lenna 3 20
C Harry 4 20
C Joe 7 20
I've tried adding the new column like this:
df['num_exercises_total'] = df.groupby(['letter'])['num_exercises'].sum()
But it returns the value NaN for all the rows.
Any help would be highly appreciated.
Thank you very much in advance!
You may want to check transform
df.groupby(['Letter'])['num_exercises'].transform('sum')
0 10
1 10
2 10
3 10
4 17
5 17
6 17
7 17
8 20
9 20
10 20
11 20
Name: num_exercises, dtype: int64
df['num_of_total']=df.groupby(['Letter'])['num_exercises'].transform('sum')
Transform works perfectly for this question. WenYoBen is right. I am just putting slightly different version here.
df['num_of_total']=df['num_excercises'].groupby(df['Letter']).transform('sum')
>>> df
Letter name num_excercises num_of_total
0 A carl 1 10
1 A Lenna 2 10
2 A Harry 3 10
3 A Joe 4 10
4 B Carl 5 17
5 B Lenna 3 17
6 B Harry 3 17
7 B Joe 6 17
8 C Carl 6 20
9 C Lenna 3 20
10 C Harry 4 20
11 C Joe 7 20

How to strip the string and replace the existing elements in DataFrame

I have a df as below:
Index Site Name
0 Site_1 Tom
1 Site_2 Tom
2 Site_4 Jack
3 Site_8 Rose
5 Site_11 Marrie
6 Site_12 Marrie
7 Site_21 Jacob
8 Site_34 Jacob
I would like to strip the 'Site_' and only leave the number in the "Site" column, as shown below:
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
What is the best way to do this operation?
Using pd.Series.str.extract
This produces a copy with an updated columns
df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
To persist the results, reassign to the data frame name
df = df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Using pd.Series.str.split
df.assign(Site=df.Site.str.split('_', 1).str[1])
Alternative
Update instead of producing a copy
df.update(df.Site.str.extract('\D+(\d+)', expand=False))
# Or
# df.update(df.Site.str.split('_', 1).str[1])
df
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
Make a array consist of the names you want. Then call
yourarray = pd.DataFrame(yourpd, columns=yournamearray)
Just call replace on the column to replace all instances of "Site_":
df['Site'] = df['Site'].str.replace('Site_', '')
Use .apply() to apply a function to each element in a series:
df['Site Name'] = df['Site Name'].apply(lambda x: x.split('_')[-1])
You can use exactly what you wanted (the strip method)
>>> df["Site"] = df.Site.str.strip("Site_")
Output
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob

Categories