Python: Reading a file and calculating sum and average - python

The question is to read the file line by line and calculate and display the sum and average of all of the valid numbers in the file.
The text file is
contains text
79.3
56.15
67
6
text again
57.86
6
37.863
text again
456.675
That's all I have so far.
numbers = open('fileofnumbers.txt', 'r')
line = file_contents.readline()
numbers.close()
try:
sum = line + line
line = file_contents.readline()
print "The sum of the numbers is", sum
except ValueError:
print line

Using with notation can make dealing with files a lot more intuitive.
For instance, changing the opening and closing to this:
summation = 0
# Within the with block you now have access to the source variable
with open('fileofnumbers.txt', 'r') as source:
for line in source: #iterate through all the lines of the file
try:
# Since files are read in as strings, you have to cast each line to a float
summation += float(line)
except ValueError:
pass
Might get you started
If you want to be a little more clever, there's a convenient python function called isdigit, which checks if a string is all integer values, which can let you do very clever things like this:
is_number = lambda number: all(number.split('.').isdigit())
answer = [float(line) for line in open('fileofnumbers.txt') if is_number(line)]
Which then makes sum and average trivial:
print sum(answer) # Sum
print sum(answer)/len(answer) #Average

Let's try list comprehension with try-except. This might be an overkill but surely a good tool to keep in your pocket, first you write a function that will silence the errors as such in http://code.activestate.com/recipes/576872-exception-handling-in-a-single-line/.
Then you can use list comprehension by passing in argv like you do in Unix:
intxt = """contains text
29.3423
23.1544913425
4
36.5
text again
79.5074638
3
76.451
text again
84.52"""
with open('in.txt','w') as fout:
fout.write(intxt)
def safecall(f, default=None, exception=Exception):
'''Returns modified f. When the modified f is called and throws an
exception, the default value is returned'''
def _safecall(*args,**argv):
try:
return f(*args,**argv)
except exception:
return default
return _safecall
with open('in.txt','r') as fin:
numbers = [safecall(float, 0, exception=ValueError)(i) for i in fin]
print "sum:", sum(numbers)
print "avg:", sum(numbers)/float(len(numbers))
[out]:
sum: 336.475255142
avg: 30.5886595584

Related

Iterating through characters in a python string and summing their values

So I have a very simple task. The Project Euler problem Names Scores gives us a file with a set of strings(which are names). Now you have to sort these names in the alphabetical order and then compute what is known as a name score for each of these names and sum them all up. The name score calculation is pretty simple. All you have to do is take a name and then sum up the values of the alphabets in the name and then multiply this sum with the position that the name has on the list. Obviously this seems a pretty simple question.
Being a python beginner, I wanted to try this out on python and being a beginner this was the code I wrote out. I did use list comprehensions as well along with a sum, but that gives me the same answer. Here is my code:
def name_score(s):
# print sum((ord(c)-96) for c in s)
s1 = 0;
for c in s:
s1 = s1 + (ord(c) - 96)
print s1
return s1
# print ord(c) - 96
myList = []
f = open('p022_names.txt')
for line in f:
myList.append(line.lower())
count = 0;
totalSum = 0;
for line in sorted(myList):
count = count + 1;
totalSum += (name_score(line) * count)
print totalSum
Now the file p022_names.txt contains only one line "colin". So the function name_score("colin") should return 53. Now try whatever I always end up getting the value -33. I am using PyDev on Eclipse. Now here is a curious anomaly. If I just used the list variable and populated it with the value myList = ["colin"] in the code, I get the correct answer. Honestly I don't know what is happening. Can anybody throw some light into what is happening here. There is a similar loop also in the program to calculate totalSum, but that doesn't seem to have an issue.
[EDIT] After the issue was pointed out, I am posting an updated revision of the code which works.
def name_score(s):
return sum((ord(c)-96) for c in s)
with open('p022_names.txt') as f:
myList = f.read().splitlines()
print sum((name_score(line.lower()) * (ind+1)) for ind,line in enumerate(sorted(myList)))
96 - 53 - 33 = 10
That happens because you have a newline character ("\n") in your file, thus your line is not "colin" but "colin\n".
To get rid of the newline character, multiple approaches could work. Here is an example:
Replace your line:
for line in f:
with:
for line in f.read().splitlines():
Could it be because you didn't close the file? As in f.close()?

Removing first character from string

I am working on a CodeEval challenge and have a solution to a problem which takes a list of numbers as an input and then outputs the sum of the digits of each line. Here is my code to make certain you understand what I mean:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
if test:
num = int(test)
total =0
while num != 0:
total += num % 10
num /= 10
print total
test_cases.close()
I am attempting to rewrite this where it takes the number as a string, slices each 0-index, and then adds those together (curious to see what the time and memory differences are - totally new to coding and trying to find multiple ways to do things as well)
However, I am stuck on getting this to execute and have the following:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
sums = 0
while test:
sums = sums + int(str(test)[0])
test = test[1:]
print sums
test_cases.close()
I am receiving a "ValueError: invalid literal for int() with base 10: ''"
The sample input is a text file which looks like this:
3011
6890
8778
1844
42
8849
3847
8985
5048
7350
8121
5421
7026
4246
4439
6993
4761
3658
6049
1177
Thanks for any help you can offer!
Your issue is the newlines (eg. /n or /r/n) at the end of each line.
Change this line:
for test in test_cases:
into this to split out the newlines:
for test in test_cases.read().splitlines():
try this code:
tot = 0
with open(sys.argv[1], 'r') as f:
for line in f:
try:
tot += int(line)
except ValueError:
print "Not a number"
print tot
using the context manager (with...) the file is automatically closed.
casting to int filter any empty or not valid value
you can substitute print with any other statement optimal for you (raise or pass depending on your goals)

replacing text in a file, Python

so this piece of code is meant to take a line from a file and replace the certain line from the string with a new word/number, but it doesn't seem to work :(
else:
with open('newfile', 'r+')as myfile:
x=input("what would you like to change: \nname \ncolour \nnumber \nenter option:")
if x == "name":
print("your current name is:")
test_lines = myfile.readlines()
print(test_lines[0])
y=input("change name to:")
content = (y)
myfile.write(str.replace((test_lines[0]), str(content)))
I get the error message TypeError: replace() takes at least 2 arguments (1 given), i don't know why (content) is not accepted as an argument. This also happens for the code below
if x == "number":
print ("your current fav. number is:")
test_lines = myfile.readlines()
print(test_lines[2])
number=(int(input("times fav number by a number to get your new number \ne.g 5*2 = 10 \nnew number:")))
result = (int(test_lines[2])*(number))
print (result)
myfile.write(str.replace((test_lines[2]), str(result)))
f=open('newfile', 'r')
print("now we will print the file:")
for line in f:
print (line)
f.close
replace is a function of a 'str' object.
Sounds like you want to do something like (this is a guess not knowing your inputs)
test_lines[0].replace(test_lines[0],str(content))
I'm not sure what you're attempting to accomplish with the logic in there. looks like you want to remove that line completely and replace it?
also i'm unsure what you are trying to do with
content = (y)
the output of input is a str (which is what you want)
EDIT:
In your specific case (replacing a whole line) i would suggest just reassigning that item in the list. e.g.
test_lines[0] = content
To overwrite the file you will have to truncate it to avoid any race conditions. So once you have made your changes in memory, you should seek to the beginning, and rewrite everything.
# Your logic for replacing the line or desired changes
myfile.seek(0)
for l in test_lines:
myfile.write("%s\n" % l)
myfile.truncate()
Try this:
test_lines = myfile.readlines()
print(test_lines[0])
y = input("change name to:")
content = str(y)
myfile.write(test_lines[0].replace(test_lines[0], content))
You have no object known purely as str. The method replace() must be called on a string object. You can call it on test_lines[0] which refers to a string object.
However, you may need to change your actual program flow. However, this should circumvent the error.
You need to call it as test_lines[0].replace(test_lines[0],str(content))
Calling help(str.replace) at the interpreter.
replace(...)
S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
Couldn't find the docs.

Conversion of Multiple Strings To ASCII

This seems fairly trivial but I can't seem to work it out
I have a text file with the contents:
B>F
I am reading this with the code below, stripping the '>' and trying to convert the strings into their corresponding ASCII value, minus 65 to give me a value that will correspond to another list index
def readRoute():
routeFile = open('route.txt', 'r')
for line in routeFile.readlines():
route = line.strip('\n' '\r')
route = line.split('>')
#startNode, endNode = route
startNode = ord(route[0])-65
endNode = ord(route[1])-65
# Debug (this comment was for my use to explain below the print values)
print 'Route Entered:'
print line
print startNode, ',', endNode, '\n'
return[startNode, endNode]
However I am having slight trouble doing the conversion nicely, because the text file only contains one line at the moment but ideally I need it to be able to support more than one line and run an amount of code for each line.
For example it could contain:
B>F
A>D
C>F
E>D
So I would want to run the same code outside this function 4 times with the different inputs
Anyone able to give me a hand
Edit:
Not sure I made my issue that clear, sorry
What I need it do it parse the text file (possibly containing one line or multiple lines like above. I am able to do it for one line with the lines
startNode = ord(route[0])-65
endNode = ord(route[1])-65
But I get errors when trying to do more than one line because the ord() is expecting different inputs
If I have (below) in the route.txt
B>F
A>D
This is the error it gives me:
line 43, in readRoute endNode = ord(route[1])-65
TypeError: ord() expected a character, but string of length 2 found
My code above should read the route.txt file and see that B>F is the first route, strip the '>' - convert the B & F to ASCII, so 66 & 70 respectively then minus 65 from both to give 1 & 5 (in this example)
The 1 & 5 are corresponding indexes for another "array" (list of lists) to do computations and other things on
Once the other code has completed it can then go to the next line in route.txt which could be A>D and perform the above again
Perhaps this will work for you. I turned the fileread into a generator so you can do as you please with the parsed results in the for-i loop.
def readRoute(file_name):
with open(file_name, 'r') as r:
for line in r:
yield (ord(line[0])-65, ord(line[2])-65)
filename = 'route.txt'
for startnode, endnode in readRoute(filename):
print startnode, endnode
If you can't change readRoute, change the contents of the file before each call. Better yet, make readRoute take the filename as a parameter (default it to 'route.txt' to preserve the current behavior) so you can have it process other files.
What about something like this? It takes the routes defined in your file and turns them into path objects with start and end member variables. As an added bonus PathManager.readFile() allows you to load multiple route files without overwriting the existing paths.
import re
class Path:
def __init__(self, start, end):
self.start = ord(start) - 65 # Scale the values as desired
self.end = ord(end) - 65 # Scale the values as desired
class PathManager:
def __init__(self):
self.expr = re.compile("^([A-Za-z])[>]([A-Za-z])$") # looks for string "C>C"
# where C is a char
self.paths = []
def do_logic_routine(self, start, end):
# Do custom logic here that will execute before the next line is read
# Return True for 'continue reading' or False to stop parsing file
return True
def readFile(self, path):
file = open(path,"r")
for line in file:
item = self.expr.match(line.strip()) # strip whitespaces before parsing
if item:
'''
item.group(0) is *not* used here; it matches the whole expression
item.group(1) matches the first parenthesis in the regular expression
item.group(2) matches the second
'''
self.paths.append(Path(item.group(1), item.group(2)))
if not do_logic_routine(self.paths[-1].start, self.paths[-1].end):
break
# Running the example
MyManager = PathManager()
MyManager.readFile('route.txt')
for path in MyManager.paths:
print "Start: %s End: %s" % (path.start, path.end)
Output is:
Start: 1 End: 5
Start: 0 End: 3
Start: 2 End: 5
Start: 4 End: 3

Python: Calculating the averages of values in a text file

When I run my code below I get a: ValueError: invalid literal for int() with base 10: '0.977759164126' but i dont know why
file_open = open("A1_B1_1000.txt", "r")
file_write = open ("average.txt", "w")
line = file_open.readlines()
list_of_lines = []
length = len(list_of_lines[0])
total = 0
for i in line:
values = i.split('\t')
list_of_lines.append(values)
count = 0
for j in list_of_lines:
count +=1
for k in range(0,count):
print k
list_of_lines[k].remove('\n')
for o in range(0,count):
for p in range(0,length):
print list_of_lines[p][o]
number = int(list_of_lines[p][o])
total + number
average = total/count
print average
My text file looks like:
0.977759164126 0.977759164126 0.977759164126 0.977759164126 0.977759164126
0.981717034466 0.981717034466 0.981717034466 0.981717034466 0.98171703446
The data series is in rows and the values are tab delimited in the text file. All the rows in the file are the same length.
The aim of the script is to calculate the average of each column and write the output to a text file.
int() is used for integers (numbers like 7, 12, 7965, 0, -21233). you probably need float()
Python is limited on handling floating points. These all work fine here but for longer ones as well as arithmetic you are going to want to use the Decimal module.
import Decimal
result = Decimal.Decimal(1)/Decimal.Decimal(5)
print result
Link to the documentation
http://docs.python.org/2/library/decimal.html
Try typing in 1.1 into IDLE and see what your result is.

Categories