Write to file .txt - python

This is how the output should look like in the text file
Ray Holt
5 5 0 0 100 15
Jessica Jones
12 0 6 6 50 6
Johnny Rose
6 2 0 4 20 10
Gina Linetti
7 4 0 3 300 15
The number is the result from the game that I have to create.
My question is, how can I write to the text file with both string and integer result ?
I have tried this
def write_to_file(filename, player_list):
output = open(filename, "w")
for player in player_list:
output.write(str(player))
but the output is
Ray Holt 5 5 0 0 100 15Jessica Jones 12 0 6 6 50 6Johnny Rose 6 2 0 4 20 10Gina Linetti 7 4 0 3 300 15Khang 0 0 0 0 100 0
They are in 1 line
Please help me!
Thanks a lot guys

Use this:
def write_to_file(filename, player_list):
output = open(filename, "w")
for player in player_list:
output.write(str(player)+'\n')
output.close()

Related

Python: deleting the first n terms in a string

I have a .txt where I want to delete the first 7 characters (spaces included) from every line in the file,
I've tried the following:
with open('input_nnn.txt', 'r') as input:
with open('input_nnnn.txt', 'a') as output:
for line in input:
output.write(line[6])
output.write('\n')
However I end up with error:
File "spinf.py", line 4, in <module>
out.write(line[6])
IndexError: string index out of range
To make my question clearer lets say I have a file that looks like this:
1 z 3 4 5 a 7 seven 8 9 0 11 2
1 z 3 4 5 a 7 seven 8 9 0 11 2
1 z 3 4 5 a 7 seven 8 9 0 11 2
1 z 3 4 5 a 7 seven 8 9 0 11 2
1 z 3 4 5 a 7 seven 8 9 0 11 2
1 z 3 4 5 a 7 seven 8 9 0 11 2
I'd want my output to look like this:
5 a 7 seven 8 9 0 11 2
5 a 7 seven 8 9 0 11 2
5 a 7 seven 8 9 0 11 2
5 a 7 seven 8 9 0 11 2
5 a 7 seven 8 9 0 11 2
5 a 7 seven 8 9 0 11 2
The error seems to indicate that at least one of your lines in the file does not have 7 or more characters.
Maybe adding a check on the length of the string is a good idea.
Reading and writing simultaneously to the same file is not going to work well with python (atleast the standard libraries), because you mostly have low level control of things. Your code is going to look really weird, and possibly have bugs.
import os
with open('input_nnn.txt', 'r') as original_file:
with open('input_nnnn.txt.new', 'w') as new_file:
for line in original_file:
new_file.write(line[6:] + '\n')
os.replace('input_nnn.txt.new', 'input_nnn.txt')
You could also directly do this from bash using cut
cut -c6- <file1.txt >file1.txt.new
cp -f file1.txt.new file1.txt
rm file1.txt.new
also don't use input as a variable name

joining a table to another table in pandas

I am trying to grab the data from, https://www.espn.com/nhl/standings
When I try to grab it, it is putting Florida Panthers one row to high and messing up the data. All the team names need to be shifted down a row. I have tried to mutate the data and tried,
dataset_one = dataset_one.shift(1)
and then joining with the stats table but I am getting NaN.
The docs seem to show a lot of ways of joining and merging data with similar columns headers but not sure the best solution here without a similar column header to join with.
Code:
import pandas as pd
page = pd.read_html('https://www.espn.com/nhl/standings')
dataset_one = page[0] # Team Names
dataset_two = page[1] # Stats
combined_data = dataset_one.join(dataset_two)
print(combined_data)
Output:
FLAFlorida Panthers GP W L OTL ... GF GA DIFF L10 STRK
0 CBJColumbus Blue Jackets 6 5 0 1 ... 22 16 6 5-0-1 W2
1 CARCarolina Hurricanes 10 4 3 3 ... 24 28 -4 4-3-3 L1
2 DALDallas Stars 6 5 1 0 ... 18 10 8 5-1-0 W4
3 TBTampa Bay Lightning 6 4 1 1 ... 23 14 9 4-1-1 L2
4 CHIChicago Blackhawks 6 4 1 1 ... 19 14 5 4-1-1 W1
5 NSHNashville Predators 10 3 4 3 ... 26 31 -5 3-4-3 W1
6 DETDetroit Red Wings 8 4 4 0 ... 20 24 -4 4-4-0 L1
Desired:
GP W L OTL ... GF GA DIFF L10 STRK
0 FLAFlorida Panthers 6 5 0 1 ... 22 16 6 5-0-1 W2
1 CBJColumbus Blue Jackets 10 4 3 3 ... 24 28 -4 4-3-3 L1
2 CARCarolina Hurricanes 6 5 1 0 ... 18 10 8 5-1-0 W4
3 DALDallas Stars 6 4 1 1 ... 23 14 9 4-1-1 L2
4 TBTampa Bay Lightning 6 4 1 1 ... 19 14 5 4-1-1 W1
5 CHIChicago Blackhawks 10 3 4 3 ... 26 31 -5 3-4-3 W1
6 NSHNashville Predators 8 4 4 0 ... 20 24 -4 4-4-0 L1
7 DETDetriot Red Wings 10 2 6 2 6 ... 20 35 -15 2-6-2 L6
Providing an alternative approach to #Noah's answer. You can first add an extra row, shift the df down by a row and then assign the header col as index 0 value.
import pandas as pd
page = pd.read_html('https://www.espn.com/nhl/standings')
dataset_one = page[0] # Team Names
dataset_two = page[1] # Stats
# Shifting down by one row
dataset_one.loc[max(dataset_one.index) + 1, :] = None
dataset_one = dataset_one.shift(1)
dataset_one.iloc[0] = dataset_one.columns
dataset_one.columns = ['team']
combined_data = dataset_one.join(dataset_two)
Just create the df slightly differently so it knows what is the proper header
dataset_one = pd.DataFrame(page[0], columns=["Team Name"])
Then when you join it should be aligned properly.
Another alternative is to do the following:
dataset_one = page[0].to_frame(name='Team Name')

Group by fuzzy string matches with fuzzywuzzy and groupby

I have a dataset of random words and names and I am trying to group all of the similar words and names. So given the dataframe below:
Name ID Value
0 James 1 10
1 James 2 2 142
2 Bike 3 1
3 Bicycle 4 1197
4 James Marsh 5 12
5 Ants 6 54
6 Job 7 6
7 Michael 8 80007
8 Arm 9 47
9 Mike K 10 9
10 Michael k 11 1
My pseudo code would be something like:
import pandas as pd
from fuzzywuzzy import fuzz
minratio = 95
for idx1, name1 in df['Name'].iteritems():
for idx2, name2 in df['Name'].iteritems():
ratio = fuzz.WRatio(name1, name2)
if ratio > minratio:
grouped = df.groupby(['Name', 'ID'])['Value']\
.agg(Total_Value='sum', Group_Size='count')
This would then give me the desired output:
print(grouped)
Name ID Total_Value Group_Size
0 James 1 164 3 # All James' grouped
2 Bike 3 1198 2 # Bike's and Bicycles grouped
5 Ants 6 54 1
6 Job 7 6 1
7 Michael 8 80017 3 # Mike's and Michael's grouped
8 Arm 9 47 1
Obviously this doesn't work, and honestly, I am not sure if this is even possible, but this is what I'm trying to accomplish. Any advice that could get me on the right track would be useful.
Using affinity propagation clustering (not perfect but maybe a starting point):
import pandas as pd
import numpy as np
import io
from fuzzywuzzy import fuzz
from scipy import spatial
import sklearn.cluster
s="""Name ID Value
0 James 1 10
1 James 2 2 142
2 Bike 3 1
3 Bicycle 4 1197
4 James Marsh 5 12
5 Ants 6 54
6 Job 7 6
7 Michael 8 80007
8 Arm 9 47
9 Mike K 10 9
10 Michael k 11 1"""
df = pd.read_csv(io.StringIO(s),sep='\s\s+',engine='python')
names = df.Name.values
sim = spatial.distance.pdist(names.reshape((-1,1)), lambda x,y: fuzz.WRatio(x,y))
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed", random_state=None)
affprop.fit(spatial.distance.squareform(sim))
res = df.groupby(affprop.labels_).agg(
Names=('Name',','.join),
First_ID=('ID','first'),
Total_Value=('Value','sum'),
Group_Size=('Value','count')
)
Result
Names First_ID Total_Value Group_Size
0 James,James 2,James Marsh,Ants,Arm 1 265 5
1 Bike,Bicycle 3 1198 2
2 Job 7 6 1
3 Michael,Mike K,Michael k 8 80017 3

How do I give an error message if condition is not met?

Im doing an assignment for a basic programming course.
I have a dataframe (csv-file) containing the columns:
StudentID Name Assignment1 Assignment2 Assignment3
0 s123456 Michael Andersen 7 7 4
1 s123789 Bettina Petersen 12 10 10
2 s123468 Thomas Nielsen -3 7 2
3 s123579 Marie Hansen 10 12 12
4 s123579 Marie Hansen 10 12 12
5 s127848 Andreas Nielsen 2 2 2
6 s120799 Mads Westergaard 12 12 10
7 s123456 Michael Andersen 7 7 4
8 S184507 Andreas Døssing Mortensen 2 2 4
9 S129834 Jonas Jonassen 0 -3 4
10 S123481 Milad Mohammed 12 10 7
11 S128310 Abdul Jihad 10 4 7
12 S125493 Søren Sørensen 0 7 7
13 S128363 123 4 7 10
14 S127463 Jensen Jensen 5 2 10
15 S120987 Jeff Bezos 12 12 12
I need to make my program give an error message if a condition is not meet. In this instance if a student is in the dataframe more than once and if the grade given for an assignment is not on the scale of grades (-3, 0, 2, 4, 7, 10, 12):
The assignment is as follows:
If the user chooses to check for data errors, you must display a report of errors (if any) in the loaded data file. Your program must at least detect and display information about the following possible errors:
1. If two students in the data have the same student id.
2. If a grade in the data set is not one of the possible grades on the 7-step-scale.
How can I get around this?
I have tried to solve it like this, but no luck:
doubles = dataDuplicate["Name"].duplicated()
print(doubles)
grades = np.array([-3,0,2,4,7,10,12])
dataSortGrades = dataSortGrades.iloc[:,2:] #this gives
gradesNotInList = np.isin(dataSortGrades,grades)
if dataDuplicate["Name"] in doubles == True:
print("Error")
else:
print(#list of false values")
The standard approach is:
if condition:
print(message)
where condition (or not condition) and message should be adjusted to your specific needs.
You dont need to create 3 data frames. You can just create the dataframe and then perform you selections based on your conditions.
import pandas as pd
import re
data = """0 s123456 Michael Andersen 7 7 4
1 s123789 Bettina Petersen 12 10 10
2 s123468 Thomas Nielsen -3 7 2
3 s123579 Marie Hansen 10 12 12
4 s123579 Marie Hansen 10 12 12
5 s127848 Andreas Nielsen 2 2 2
6 s120799 Mads Westergaard 12 12 10
7 s123456 Michael Andersen 7 7 4
8 S184507 Andreas Døssing Mortensen 2 2 4
9 S129834 Jonas Jonassen 0 -3 4
10 S123481 Milad Mohammed 12 10 7
11 S128310 Abdul Jihad 10 4 7
12 S125493 Søren Sørensen 0 7 7
13 S128363 123 4 7 10
14 S127463 Jensen Jensen 5 2 10
15 S120987 Jeff Bezos 12 12 12"""
#Make the data frame
data = [re.split(r"\s{2,}", line)[1:] for line in data.splitlines()]
df = pd.DataFrame(data, columns=['StudentID', 'Name', 'Assignment1', 'Assignment2', 'Assignment3'])
#print the duplicates
print(f'###Duplicate studentIDs###')
print(df[df['StudentID'].duplicated()])
#print invalid grades
valid_grades = ('-3', '0', '2', '4', '7', '10', '12')
print(f'###Invalid grades###')
print(df[
(df['Assignment1'].isin(valid_grades) == False) |
(df['Assignment2'].isin(valid_grades) == False) |
(df['Assignment3'].isin(valid_grades) == False)
])
OUTPUT
###Duplicate studentIDs###
StudentID Name Assignment1 Assignment2 Assignment3
4 s123579 Marie Hansen 10 12 12
7 s123456 Michael Andersen 7 7 4
###Invalid grades###
StudentID Name Assignment1 Assignment2 Assignment3
14 S127463 Jensen Jensen 5 2 10

list index is out of range - cannot find why

I'm writing code to sort data into lists and then arrange these lists, but I keep getting:
builtins.IndexError: list index out of range
The data in the code is:
Class 1:
Matthew 5 5 1
Paul 9 3 6
Sara 9 4 2
Nicholas3 2 4
Larry 5 2 5
Philip 4 4 6
Patricia7 7 7
Gary 0 9 4
Marie 8 7 1
Scott 4 3 2
Class 2:
Heather 10 2 3
Lawrence 2 4 4
Stephen 1 6 8
Robert 3 5 4
Shawn 9 6 6
Michelle 6 7 4
Chris 10 4 2
Teresa 5 5 6
Dennis 7 8 1
Rose 8 3 4
Class 3:
Jojo 10 3 9
Sarah 1 2 8
Jerry 5 3 7
Aaron 7 8 5
Carl 3 7 5
Christine 9 4 4
Jennifer 2 8 2
Linda 8 8 1
Justin 4 6 3
Emily 2 4 7
The code:
import csv
while True:
ClassNumber = int(input("What class would you like to see the results for?"))
if 1<= ClassNumber <=3:
break
print("That class does not exist. Please choose from 1,2 or 3")
if ClassNumber == 1:
ClassFile = "class1scores.csv"
elif ClassNumber == 2:
ClassFile = "class2scores.csv"
else:
ClassFile = "class3scores.csv"
OpenFile = open(ClassFile, "r")
scores = csv.reader(OpenFile)
newlist = []
for row in scores:
row[1] = int(row[1])
row[2] = int(row[2])
row[3] = int(row[3])
HighestScore = max(row[0:3])
row.append(HighestScore)
AverageScore = round(sum(row[0:3])/3)
row.append(AverageScore)
By default, csv.reader assumes that a comma is the field delimiter, but your data is tab-delimited. As a result, each row has only one item, not four.
scores = csv.reader(OpenFile, delimiter="\t")

Categories