I need to parse some text and integers from a file

I need to parse some text and integers from a file - python

I'm having trouble with a problem I'm trying to do. My goal is to import a file that contains football teams names, and then the number of wins and losses, and then if a team's average is greater than .500, than I have to write that teams name, and average to a new file. And then I have to write the teams under .500 to a seperate file. so far I have my code so that it reads eachline of the file, but I can't figure out how to analyize each line of code. I'm really just looking for any advice I could get at this point, and it would be greatly appreciated.
scores = open("fbscores.txt",'r')
eachline = scores.readline()
while eachline != "":
print(eachline)
eachline = scores.readline()
scores.close()

Given an example line of the file like this:
Cowboys 4 1
Then some code might look like this:
line = scores.readline().split()
teamName = line[0]
wins = int(line[1])
losses = int(line[2])
if wins > losses:
print(teamName + "'s record is over .500!")
goodTeams.write(line)
goodTeams.write() #make a new line
else:
print(teamName + "'s record is <= .500.")
badTeams.write(line)
badTeams.write()
To find the average of a team with x wins and y losses, do:
"%0.3f" % (x/(x+y))
Gives:
>>> "%0.3f" % (4/(4+1))
'0.800'
>>>

You are most likely going to end up using the split method, which breaks a string apart into a list element each time it encounters a certain character. Read more here http://www.pythonforbeginners.com/dictionary/python-split.

You could use a high level library to do so, like pandas.
It has many of the functionnalities you want!
import pandas as pd
df = pd.read_csv('file.csv')
print(df)

It's really helpful if you post some lines from your input.
According to your description, I assuming each line in your input has the following layout:
[NAME] X Y
Where, NAME is the team name, X is the number of wins, Y, the number of loses.
You can simply split the line by any delimiter (which is 'space' in this example case),
eachline = eachline.split()
If there was a comma separator, then you do eachline = eachline.split(',') and so on, you get the idea.
Then eachline will store a list of the following structure ['NAME', X, Y]
Then you can access X and Y to average them using: eachline[0] and eachline[1] respectively.

Related

Python help. Finding largest value in a file and printing out value w name

I need to create a progtam that opens a file then reads the values inside the file and then prints out the name with the largest value.
The file contains the following info:
Juan,27
Joe,16
Mike,29
Roy,10
Now the code I have is as follows:
UserFile=input('enter file name')
FileOpen=open(User File,'r')
for lines in User File:
data=line.split(",")
name=data[0]
hrs=data[1]
hrs=int(hrs)
LHRS = 0
if hrs > LHRS:
LHRS = hrs
if LHRS == LHRS:
print('Person with largest hours is',name)
The following prints out :
Person with the largest hours is Juan
Person with the largest hours is Mike
How can I make it so it only prints out the true largest?

While your effort for a first timer is pretty impressive, what you're unable to do here is.. Keep track of the name WHILE keeping track of the max value! I'm sure it can be done in your way, but might I suggest an alternative?
import operator
Let's read in the file like how I've done. This is good practice, this method handles file closing which can be the cause of many problems if not done properly.
with open('/Users/abhishekbabuji/Desktop/example.txt', 'r') as fh:
lines = fh.readlines()
Now that I have each line in a list called lines, it also has this annoying \n in it. Let's replace that with empty space ''
lines = [line.replace("\n", "") for line in lines]
Now we have a list like this. ['Name1, Value1', 'Name2, Value2'..] What I intend to do now, is for each string item in my list, take the first part in as a key, and the integer portion of the second part as the value to my dictionary called example_dict. So in 'Name1, Value1', Name1 is the item in index 0 and Name2 is my item in index 1 when I turn this into a list like I've done below and added the key, value pair into the dictionary.
example_dict = {}
for text in lines:
example_dict[text.split(",")[0]] = int(text.split(",")[1])
print(example_dict)
Gives:
{'Juan': 27, 'Joe': 16, 'Mike': 29, 'Roy': 10}
Now, obtain the key whose value is max and print it.
largest_hour = max(example_dict.items(), key=operator.itemgetter(1))[1]
highest_key = []
for person, hours in example_dict.items():
if hours == largest_hour:
highest_key.append((person, hours))
for pair in highest_key:
print('Person with largest hours is:', pair[0])

How would I take the number of names in a list and then write the results to a file?

I am fairly new to python and am having difficulties with this (most likely simple) problem. I'm accepting a file with the format.
name_of_sports_team year_they_won_championship
e.g.,
1991 Minnesota
1992 Toronto
1993 Toronto
They are already separated into a nested list [year][name]. I am tasked to add up all the repetitions from the list and display them as such in a new file.
Toronto 2
Minnesota 1
My code is as follows-
def write_tab_seperated(n):
'''
N is the filename
'''
file = open(n, "w")
# names are always in the second position?
data[2] = names
countnames = ()
# counting the names
for x in names:
# make sure they are all the same
x = str(name).lower()
# add one if it shows.
if x in countnames:
countnames[x] += 1
else:
countnames[x] = 1
# finish writing the file
file.close
This is so wrong its funny, but I planned out where to go from here:
Take the file
separate into the names list
add 1 for each repetition
display in name(tab)number format
close the file.
Any help is appreciated and thank you in advance!

There's a built-in datatype that's perfect for your use case called collections.Counter.
I'm assuming from the sample I/O formatting that your data file columns are tab separated. In the question text it looks like 4-spaces — if that's the case, just change '\t' to ' ' or ' '*4 below.
with open('data.tsv') as f:
lines = (l.strip().split('\t') for l in f.readlines())
Once you've read the data in, it really is as simple as passing it to a Counter and specifying that it should create counts on the values in the second column.
from collections import Counter
c = Counter(x[1] for x in lines)
And printing them back out for reference:
for k, v in c.items():
print('{}\t{}'.format(k, v))
Output:
Minnesota 1
Toronto 2

From what I understand through your explanation, the following is my piece of code:
#input.txt is the input file with <year><tab><city> data
with open('input.txt','r') as f:
input_list =[x.strip().split('\t') for x in f]
output_dict = {}
for per_item in input_list:
if per_item[1] in output_dict:
output_dict[per_item[1]] += 1
else:
output_dict[per_item[1]] = 1
#output file has <city><tab><number of occurence>
file_output = open("output.txt","w")
for per_val in output_dict:
file_output.write(per_val + "\t" + str(output_dict[per_val]) + "\n")
Let me know if it helps.

One of the great things about python is the huge number of packages. For handling tabular data, I'd recommend using pandas and the csv format:
import pandas as pd
years = list(range(1990, 1994))
names = ['Toronto', 'Minnesota', 'Boston', 'Toronto']
dataframe = pd.DataFrame(data={'years': years, 'names': names})
dataframe.to_csv('path/to/file.csv')
That being said, I would still highly recommend to go through your code and learn how these things are done from scratch.

I want to convert list items to separate integers & add them after

I have a file full of lines contain words & numbers
I want these numbers to be found then summed
What I wrote is:
import re
fhand = open('ReSample.txt')
for line in fhand:
y = re.findall('[0-9]+', line)
for item in y:
item = int(item)
total = total + item
print total
The error is total is not defined!!!
File lines sample
Writing programs (or programming) is a very creative and rewarding
activity. You can write programs for 3036 many reasons, ranging from
making your living to solving 7209 a difficult data analysis problem
to having fun to helping
Desired output >>> 3036 + 7209 + ......
Can you fix my code without critical changes please?
Thanks in advance..

You are trying to use the variable total to add item to it and assign the sum to total. By the first execution of your loop, total is not definied.
Please consider this approach:
import re
fhand = open('ReSample.txt')
total=0
for line in fhand:
y = re.findall('[0-9]+', line)
for item in y:
item = int(item)
total = total + item
print(total)
The output is 10245.

UPDATE: Calculate vector length according to str value in specific column in Python

I am trying to measure the length of vectors based on a value of the first column of my input data.
For instance: my input data is as follows:
dog nmod+n+-n 4
dog nmod+n+n-a-commitment-n 6
child into+ns-j+vn-pass-rb-divide-v 3
child nmod+n+ns-commitment-n 5
child nmod+n+n-pledge-n 3
hello nmod+n+ns 2
The value that I want to calculate is based on an identical value in the first column. For instance, I would calculate a value based on all rows in which dog is in the first column, then I would calculate a value based on all rows in which child is in the first column... and so on.
I have worked out the mathematics to calculate the vector length (Euc. norm). However, I am unsure how to base the calculation based on grouping the identical values in the first column.
So far, this is the code that I have written:
#!/usr/bin/python
import os
import sys
import getopt
import datetime
import math
print "starting:",
print datetime.datetime.now()
def countVectorLength(infile, outfile):
with open(infile, 'rb') as inputfile:
flem, _, fw = next(inputfile).split()
current_lem = flem
weights = [float(fw)]
for line in inputfile:
lem, _, w = line.split()
if lem == current_lem:
weights.append(float(w))
else:
print current_lem,
print math.sqrt(sum([math.pow(weight,2) for weight in weights]))
current_lem = lem
weights = [float(w)]
print current_lem,
print math.sqrt(sum([math.pow(weight,2) for weight in weights]))
print "Finish:",
print datetime.datetime.now()
path = '/Path/to/Input/'
pathout = '/Path/to/Output'
listing = os.listdir(path)
for infile in listing:
outfile = 'output' + infile
print "current file is:" + infile
countVectorLength(path + infile, pathout + outfile)
This code outputs the length of vector of each individual lemma. The above data gives me the following output:
dog 7.211102550927978
child 6.48074069840786
hello 2
UPDATE
I have been working on it and I have managed to get the following working code, as updated in the code sample above. However, as you will be able to see. The code has a problem with the output of the very last line of each file --- which I have solved rather rudimentarily by manually adding it. However, because of this problem, it does not permit a clean iteration through the directory -- outputting all of the results of all files in an appended > document. Is there a way to make this a bit cleaner, pythonic way to output directly each individual corresponding file in the outpath directory?

First thing, you need to transform the input into something like
dog => [4,2]
child => [3,5,3]
etc
It goes like this:
from collections import defaultdict
data = defaultdict(list)
for line in file:
line = line.split('\t')
data[line[0]].append(line[2])
Once this is done, the rest is obvious:
def vector_len(vec):
you already got that
vector_lens = {name: vector_len(values) for name, values in data.items()}

Sorting and aligning the contents of a text file in Python

In my program I have a text file that I read from and write to. However, I would like to display the contents of the text file in an aligned and sorted manner. The contents currently read:
Emily, 6
Sarah, 4
Jess, 7
This is my code where the text file in read and printed:
elif userCommand == 'V':
print "High Scores:"
scoresFile = open("scores1.txt", 'r')
scores = scoresFile.read().split("\n")
for score in scores:
print score
scoresFile.close()
Would I have to convert this information into lists in order to be able to do this? If so, how do I go about doing this?
When writing to the file, I have added a '\n' character to the end, as each record should be printed on a new line.
Thank you

You could use csv module, and then could use sorted to sort.
Let's says, scores1.txt have following
Richard,100
Michael,200
Ricky,150
Chaung,100
Test
import csv
reader=csv.reader(open("scores1.txt"),dialect='excel')
items=sorted(reader)
for x in items:
print x[0],x[1]
...
Emily 6
Jess 7
Sarah 4

Looks like nobody's answered the "aligned" part of your request. Also, it's not clear whether you want the results sorted alphabetically by name, or rather by score. In the first case, alphabetical order (assuming Python 2.6):
with open("scores1.txt", 'r') as scoresFile:
names_scores = [[x.strip() for x in l.split(',', 1)] for l in scoresFile]
# compute column widths
name_width = max(len(name) for name, score in names_scores)
score_width = max(len(score) for name, score in names_scores)
# sort and print
names_scores.sort()
for name, score in names_scores:
print "%*s %*s" % (name_width, name, score_width, score)
If you want descending order by score, just change the names_scores.sort() line to two:
def getscore_int(name_score): return int(name_score[1])
names_scores.sort(key=getscore_int, reverse=True)

to sort stuff in Python, you can use sort()/sorted().
to print, you can use print with format specifiers, str.rjust/str.ljust, pprint etc

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

I need to parse some text and integers from a file - python

You are most likely going to end up using the split method, which breaks a string apart into a list element each time it encounters a certain character. Read more here http://www.pythonforbeginners.com/dictionary/python-split.

You could use a high level library to do so, like pandas. It has many of the functionnalities you want! import pandas as pd df = pd.read_csv('file.csv') print(df)

Related

Python help. Finding largest value in a file and printing out value w name

How would I take the number of names in a list and then write the results to a file?

I want to convert list items to separate integers & add them after

UPDATE: Calculate vector length according to str value in specific column in Python

Sorting and aligning the contents of a text file in Python

Categories

Resources