Python: Calculating the averages of values in a text file

Python: Calculating the averages of values in a text file - python

When I run my code below I get a: ValueError: invalid literal for int() with base 10: '0.977759164126' but i dont know why
file_open = open("A1_B1_1000.txt", "r")
file_write = open ("average.txt", "w")
line = file_open.readlines()
list_of_lines = []
length = len(list_of_lines[0])
total = 0
for i in line:
values = i.split('\t')
list_of_lines.append(values)
count = 0
for j in list_of_lines:
count +=1
for k in range(0,count):
print k
list_of_lines[k].remove('\n')
for o in range(0,count):
for p in range(0,length):
print list_of_lines[p][o]
number = int(list_of_lines[p][o])
total + number
average = total/count
print average
My text file looks like:
0.977759164126 0.977759164126 0.977759164126 0.977759164126 0.977759164126
0.981717034466 0.981717034466 0.981717034466 0.981717034466 0.98171703446
The data series is in rows and the values are tab delimited in the text file. All the rows in the file are the same length.
The aim of the script is to calculate the average of each column and write the output to a text file.

int() is used for integers (numbers like 7, 12, 7965, 0, -21233). you probably need float()

Python is limited on handling floating points. These all work fine here but for longer ones as well as arithmetic you are going to want to use the Decimal module.
import Decimal
result = Decimal.Decimal(1)/Decimal.Decimal(5)
print result
Link to the documentation
http://docs.python.org/2/library/decimal.html
Try typing in 1.1 into IDLE and see what your result is.

Related

Formatted strings, decimals and commas question

I have a .txt file that I read in and wish to create formatted strings using these values. Columns 3 and 4 need decimals and the last column needs a percent sign and 2 decimal places. The formatted string will say something like "The overall attendance at Bulls was 894659, average attendance was 21,820 and the capacity was 104.30%’
the shortened .txt file has these lines:
1 Bulls 894659 21820 104.3
2 Cavaliers 843042 20562 100
3 Mavericks 825901 20143 104.9
4 Raptors 812863 19825 100.1
5 NY_Knicks 812292 19812 100
So far my code looks like this and its mostly working, minus the commas and decimal places.
file_1 = open ('basketball.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
list_1.append(items)
print('Number of teams: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1: #iterate over the lines of the file and print the lines with formatted strings
a, b, c, d, e = line
print (f'The overall attendance at the {b} game was {c}, average attendance was {d}, and the capacity was {e}%.')
Any help with how to format the code to show the numbers with commas (21820 ->21,828) and last column with 2 decimals and a percent sign (104.3 -> 104.30%) is greatly appreciated.

You've got some options for how to tackle this.
Option 1: Using f strings (Python 3 only)
Since your provided code already uses f strings, this solution should work for you. For others reading here, this will only work if you are using Python 3.
You can do string formatting within f strings, signified by putting a colon : after the variable name within the curly brackets {}, after which you can use all of the usual python string formatting options.
Thus, you could just change one of your lines of code to get this done. Your print line would look like:
print(f'The overall attendance at the {b} game was {int(c):,}, average attendance was {int(d):,}, and the capacity was {float(e):.2f}%.')
The variables are getting interpreted as:
The {b} just prints the string b.
The {int(c):,} and {int(d):,} print the integer versions of c and d, respectively, with commas (indicated by the :,).
The {float(e):.2f} prints the float version of e with two decimal places (indicated by the :.2f).
Option 2: Using string.format()
For others here who are looking for a Python 2 friendly solution, you can change the print line to the following:
print("The overall attendance at the {} game was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.".format(b, int(c), int(d), float(e)))
Note that both options use the same formatting syntax, just the f string option has the benefit of having you write your variable name right where it will appear in the resulting printed string.

This is how I ended up doing it, very similar to the response from Bibit.
file_1 = open ('something.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
items[2] = int(items[2])
items[3] = int(items[3])
items[4] = float(items[4])
list_1.append(items)
print('Number of teams/rows: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1:
print ('The overall attendance at the {:s} games was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.'.format(line[1], line[2], line[3], line[4]))

How to insert random spaces in txt file?

I have a file with lines of DNA in a file called 'DNASeq.txt'. I need a code to read each line and split each line at random places (inserting spaces) throughout the line. Each line needs to be split at different places.
EX: I have:
AAACCCHTHTHDAFHDSAFJANFAJDSNFADKFAFJ
And I need something like this:
AAA ADSF DFAFDDSAF ADF ADSF AFD AFAD
I have tried (!!!very new to python!!):
import random
for x in range(10):
print(random.randint(50,250))
but that prints me random numbers. Is there some way to get a random number generated as like a variable?

You can read a file line wise, write each line character-wise in a new file and insert spaces randomly:
Create demo file without spaces:
with open("t.txt","w") as f:
f.write("""ASDFSFDGHJEQWRJIJG
ASDFJSDGFIJ
SADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFG
SDFJGIKDSFGOROHPTLPASDMKFGDOKRAMGO""")
Read and rewrite demo file:
import random
max_no_space = 9 # if max sequence length without space
no_space = 0
with open("t.txt","r") as f, open("n.txt","w") as w:
for line in f:
for c in line:
w.write(c)
if random.randint(1,6) == 1 or no_space >= max_no_space:
w.write(" ")
no_space = 0
else:
no_space += 1
with open("n.txt") as k:
print(k.read())
Output:
ASDF SFD GHJEQWRJIJG
A SDFJ SDG FIJ
SADFJSD FJ JDSFJIDFJG I JSRGJSDJ FIDJFG
The pattern of spaces is random. You can influence it by settin max_no_spaces or remove the randomness to split after max_no_spaces all the time
Edit:
This way of writing 1 character at a time if you need to read 200+ en block is not very economic, you can do it with the same code like so:
with open("t.txt","w") as f:
f.write("""ASDFSFDGHJEQWRJIJSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGG
ASDFJSDGFIJSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGSADFJSDFJJDSFJIDFJGIJK
SADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJF
SDFJGIKDSFGOROHPTLPASDMKFGDOKRAMGSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFG""")
import random
min_no_space = 10
max_no_space = 20 # if max sequence length without space
no_space = 0
with open("t.txt","r") as f, open("n.txt","w") as w:
for line in f:
for c in line:
w.write(c)
if no_space > min_no_space:
if random.randint(1,6) == 1 or no_space >= max_no_space:
w.write(" ")
no_space = 0
else:
no_space += 1
with open("n.txt") as k:
print(k.read())
Output:
ASDFSFDGHJEQ WRJIJSADFJSDF JJDSFJIDFJGIJ SRGJSDJFIDJFGG
ASDFJSDGFIJSA DFJSDFJJDSFJIDF JGIJSRGJSDJFIDJ FGSADFJSDFJJ DSFJIDFJGIJK
SADFJ SDFJJDSFJIDFJG IJSRGJSDJFIDJ FGSADFJSDFJJDS FJIDFJGIJSRG JSDJFIDJF
SDFJG IKDSFGOROHPTLPASDMKFGD OKRAMGSADFJSDF JJDSFJIDFJGI JSRGJSDJFIDJFG

If you want to split your DNA fixed amount of times (10 in my example) here's what you could try:
import random
DNA = 'AAACCCHTHTHDAFHDSAFJANFAJDSNFADKFAFJ'
splitted_DNA = ''
for split_idx in sorted(random.sample(range(len(DNA)), 10)):
splitted_DNA += DNA[len(splitted_DNA)-splitted_DNA.count(' ') :split_idx] + ' '
splitted_DNA += DNA[split_idx:]
print(splitted_DNA) # -> AAACCCHT HTH D AF HD SA F JANFAJDSNFA DK FAFJ

import random
with open('source', 'r') as in_file:
with open('dest', 'w') as out_file:
for line in in_file:
newLine = ''.join(map(lambda x:x+' '*random.randint(0,1), line)).strip() + '\n'
out_file.write(newLine)
Since you mentioned being new, I'll try to explain
I'm writing the new sequences to another file for precaution. It's
not safe to write to the file you are reading from.
The with constructor is so that you don't need to explicitly close
the file you opened.
Files can be read line by line using for loop.
''.join() converts a list to a string.
map() applies a function to every element of a list and returns the
results as a new list.
lambda is how you define a function without naming it. lambda x:
2*x doubles the number you feed it.
x + ' ' * 3 adds 3 spaces after x. random.randint(0, 1) returns
either 1 or 0. So I'm randomly selecting if I'll add a space after
each character or not. If the random.randint() returns 0, 0 spaces are added.

You can toss a coin after each character whether to add space there or not.
This function takes string as input and returns output with space inserted at random places.
def insert_random_spaces(str):
from random import randint
output_string = "".join([x+randint(0,1)*" " for x in str])
return output_string

Iterating through characters in a python string and summing their values

So I have a very simple task. The Project Euler problem Names Scores gives us a file with a set of strings(which are names). Now you have to sort these names in the alphabetical order and then compute what is known as a name score for each of these names and sum them all up. The name score calculation is pretty simple. All you have to do is take a name and then sum up the values of the alphabets in the name and then multiply this sum with the position that the name has on the list. Obviously this seems a pretty simple question.
Being a python beginner, I wanted to try this out on python and being a beginner this was the code I wrote out. I did use list comprehensions as well along with a sum, but that gives me the same answer. Here is my code:
def name_score(s):
# print sum((ord(c)-96) for c in s)
s1 = 0;
for c in s:
s1 = s1 + (ord(c) - 96)
print s1
return s1
# print ord(c) - 96
myList = []
f = open('p022_names.txt')
for line in f:
myList.append(line.lower())
count = 0;
totalSum = 0;
for line in sorted(myList):
count = count + 1;
totalSum += (name_score(line) * count)
print totalSum
Now the file p022_names.txt contains only one line "colin". So the function name_score("colin") should return 53. Now try whatever I always end up getting the value -33. I am using PyDev on Eclipse. Now here is a curious anomaly. If I just used the list variable and populated it with the value myList = ["colin"] in the code, I get the correct answer. Honestly I don't know what is happening. Can anybody throw some light into what is happening here. There is a similar loop also in the program to calculate totalSum, but that doesn't seem to have an issue.
[EDIT] After the issue was pointed out, I am posting an updated revision of the code which works.
def name_score(s):
return sum((ord(c)-96) for c in s)
with open('p022_names.txt') as f:
myList = f.read().splitlines()
print sum((name_score(line.lower()) * (ind+1)) for ind,line in enumerate(sorted(myList)))

96 - 53 - 33 = 10
That happens because you have a newline character ("\n") in your file, thus your line is not "colin" but "colin\n".
To get rid of the newline character, multiple approaches could work. Here is an example:
Replace your line:
for line in f:
with:
for line in f.read().splitlines():

Could it be because you didn't close the file? As in f.close()?

Removing first character from string

I am working on a CodeEval challenge and have a solution to a problem which takes a list of numbers as an input and then outputs the sum of the digits of each line. Here is my code to make certain you understand what I mean:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
if test:
num = int(test)
total =0
while num != 0:
total += num % 10
num /= 10
print total
test_cases.close()
I am attempting to rewrite this where it takes the number as a string, slices each 0-index, and then adds those together (curious to see what the time and memory differences are - totally new to coding and trying to find multiple ways to do things as well)
However, I am stuck on getting this to execute and have the following:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
sums = 0
while test:
sums = sums + int(str(test)[0])
test = test[1:]
print sums
test_cases.close()
I am receiving a "ValueError: invalid literal for int() with base 10: ''"
The sample input is a text file which looks like this:
3011
6890
8778
1844
42
8849
3847
8985
5048
7350
8121
5421
7026
4246
4439
6993
4761
3658
6049
1177
Thanks for any help you can offer!

Your issue is the newlines (eg. /n or /r/n) at the end of each line.
Change this line:
for test in test_cases:
into this to split out the newlines:
for test in test_cases.read().splitlines():

try this code:
tot = 0
with open(sys.argv[1], 'r') as f:
for line in f:
try:
tot += int(line)
except ValueError:
print "Not a number"
print tot
using the context manager (with...) the file is automatically closed.
casting to int filter any empty or not valid value
you can substitute print with any other statement optimal for you (raise or pass depending on your goals)

Python: Reading a file and calculating sum and average

The question is to read the file line by line and calculate and display the sum and average of all of the valid numbers in the file.
The text file is
contains text
79.3
56.15
67
6
text again
57.86
6
37.863
text again
456.675
That's all I have so far.
numbers = open('fileofnumbers.txt', 'r')
line = file_contents.readline()
numbers.close()
try:
sum = line + line
line = file_contents.readline()
print "The sum of the numbers is", sum
except ValueError:
print line

Using with notation can make dealing with files a lot more intuitive.
For instance, changing the opening and closing to this:
summation = 0
# Within the with block you now have access to the source variable
with open('fileofnumbers.txt', 'r') as source:
for line in source: #iterate through all the lines of the file
try:
# Since files are read in as strings, you have to cast each line to a float
summation += float(line)
except ValueError:
pass
Might get you started
If you want to be a little more clever, there's a convenient python function called isdigit, which checks if a string is all integer values, which can let you do very clever things like this:
is_number = lambda number: all(number.split('.').isdigit())
answer = [float(line) for line in open('fileofnumbers.txt') if is_number(line)]
Which then makes sum and average trivial:
print sum(answer) # Sum
print sum(answer)/len(answer) #Average

Let's try list comprehension with try-except. This might be an overkill but surely a good tool to keep in your pocket, first you write a function that will silence the errors as such in http://code.activestate.com/recipes/576872-exception-handling-in-a-single-line/.
Then you can use list comprehension by passing in argv like you do in Unix:
intxt = """contains text
29.3423
23.1544913425
4
36.5
text again
79.5074638
3
76.451
text again
84.52"""
with open('in.txt','w') as fout:
fout.write(intxt)
def safecall(f, default=None, exception=Exception):
'''Returns modified f. When the modified f is called and throws an
exception, the default value is returned'''
def _safecall(*args,**argv):
try:
return f(*args,**argv)
except exception:
return default
return _safecall
with open('in.txt','r') as fin:
numbers = [safecall(float, 0, exception=ValueError)(i) for i in fin]
print "sum:", sum(numbers)
print "avg:", sum(numbers)/float(len(numbers))
[out]:
sum: 336.475255142
avg: 30.5886595584

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Calculating the averages of values in a text file - python

int() is used for integers (numbers like 7, 12, 7965, 0, -21233). you probably need float()

Related

Formatted strings, decimals and commas question

How to insert random spaces in txt file?

Iterating through characters in a python string and summing their values

Removing first character from string

Python: Reading a file and calculating sum and average

Categories

Resources