Returning .txt file contents - python

I have a file, Testing.txt:
type,stan,820000000,92
paul,tanner,820000095,54
remmy,gono,820000046,68
bono,Jose,820000023,73
simple,rem,820000037,71
I'm trying to create a function that takes this file and returns:
The average of all the grades (last numbers in the file of each line),
and the ID (long numbers within file) of the highest and lowest grades.
I know how to get the average but am stuck trying to get the IDs.
So far my code looks like this:
#Function:
def avg_file(filename):
with open(filename, 'r') as f:
data = [int(line.split()[2]) for line in f]
return sum(data)/len(data)
avg = avg_file(filename)
return avg
#main program:
import q3_function
filename = "testing.txt"
average = q3_function.avg_file(filename)
print (average)

You can use a list comprehension to get the desire pairs of ID and score :
>>>l= [i.split(',')[-2:] for i in open(filename, 'r') if not i=='\n']
[['820000000', '92'], ['820000095', '54'], ['820000046', '68'], ['820000023', '73'], ['820000037', '71']]
Then for calculation the average you can use zip within map and sum functions:
>>> avg=sum(map(int,zip(*l)[1]))/len(l)
>>> avg
71
And for min and max use built-in functions min and max with a proper key :
max_id=max(l,key=itemgetter(1))[0]
min_id=min(l,key=itemgetter(1))[0]
Demo :
>>> from operator import itemgetter
>>> max(l,key=itemgetter(1))
['820000000', '92']
>>> max(l,key=itemgetter(1))[0]
'820000000'
>>> min(l,key=itemgetter(1))[0]
'820000095'
>>> min(l,key=itemgetter(1))
['820000095', '54']
>>> min(l,key=itemgetter(1))[0]
'820000095'

I think using the python csv module would help.
Here is several examples : http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb

Related

How to sum values of an identical key

I need Python to read a .txt file and sum up the hours each student has attended school for the year. I need help understanding how to do this when the same student has multiple lines in the file. The .txt file looks something like this:
John0550
John0550
Sally1007
And the ultimate result I'm looking for in Python is to print out a list like:
John has attended 1100 hours
Sally has attended 1007 hours
I know I can't rely on a dict() because it won't accommodate identical keys. So what is the best way to do this?
Suppose you already have a function named split_line that returns the student's name / hours attented pair for each. Your algorithm would look like :
hours_attented_per_student = {} # Create an empty dict
with open("my_file.txt", "r") as file:
for line in file.readlines():
name, hour = split_line(line)
# Check whether you have already counted some hours for this student
if name not in hours_attented_per_student.keys():
# Student was not encountered yet, set its hours to 0
hours_attented_per_student[name] = 0
# Now that student name is in dict, increase the amount of hours attented for the student
hours_attented_per_student[name] += hours
A defaultdict could be helpful here:
import re
from collections import defaultdict
from io import StringIO
# Simulate File
with StringIO('''John0550
John0550
Sally1007''') as f:
# Create defaultdict initialized at 0
d = defaultdict(lambda: 0)
# For each line in the file
for line in f.readlines():
# Split Name from Value
name, value = re.split(r'(^[^\d]+)', line)[1:]
# Sum Value into dict
d[name] += int(value)
# For Display
print(dict(d))
Output:
{'John': 1100, 'Sally': 1007}
Assuming values are already split and parsed:
from collections import defaultdict
entries = [('John', 550), ('John', 550), ('Sally', 1007)]
d = defaultdict(int)
for name, value in entries:
# Sum Value into dict
d[name] += int(value)
# For Display
print(dict(d))

Searching the the lowest Value from a file

I've included a test sample of my golf.txt file and the code I've used to print the results.
golf.txt:
Andrew
53
Dougie
21
The code to open this file and print the results (to keep this short I only have two players and two scores
golfData = open('golf.txt','r')
whichLine=golfData.readlines()
for i in range(0,len(whichLine),2):
print('Name:'+whichLine[i])
print('Score:'+whichLine[i+1])
golfData.close()
Can I modify the code I have to pull out the minimum player score with name? I believe I can without writing to a list or dictionary but have NO clue how.
Help/suggestions much appreciated.
Use min() function for that:
with open('file.txt') as f_in:
min_player, min_score = min(zip(f_in, f_in), key=lambda k: int(k[1]))
print(min_player, min_score)
Prints:
Dougie
21
As #h0r53 indicated, you can try something like this:
golfData = open('golf.txt','r')
whichLine=golfData.readlines()
lowest=float('Inf')
Name=''
for i in range(0,len(whichLine),2):
if float(whichLine[i+1])<lowest:
lowest=float(whichLine[i+1])
Name=whichLine[i]
golfData.close()
print(Name)
print(lowest)

Is there a way in Python to find a file with the smallest number in its name?

I have a bunch of documents created by one script that are all called like this:
name_*score*
*score* is a float and I need in another script to identify the file with the smallest number in the folder. Example:
name_123.12
name_145.45
This should return string "name_123.12"
min takes a key function. You can use that to define the way min is calculated:
files = [
"name_123.12",
"name_145.45",
"name_121.45",
"name_121.457"
]
min(files, key=lambda x: float((x.split('_')[1])))
# name_121.45
You can try get the number part first, and then convert it to float and sort.
for example:
new_list = [float(name[5:]) for name in YOURLIST] # trim out the unrelated part and convert to float
result = 'name_' + str(min(new_list)) # this is your result
Just wanted to say Mark Meyer is completely right on this one, but you also mentioned that you were reading these file names from a directory. In that case, there is a bit of code you could add to Mark's answer:
import glob, os
os.chdir("/path/to/directory")
files = glob.glob("*")
print(min(files, key=lambda x: float((x.split('_')[1]))))
A way to get the lowest value by providing a directory.
import os
import re
import sys
def get_lowest(directory):
lowest = sys.maxint
for filename in os.listdir(directory):
match = re.match(r'name_\d+(?:\.\d+)', filename)
if match:
number = re.search(r'(?<=_)\d+(?:\.\d+)', match.group(0))
if number:
value = float(number.group(0))
if value < lowest:
lowest = value
return lowest
print(get_lowest('./'))
Expanded on Tim Biegeleisen's answer, thank you Tim!

How to find the average of values in a .txt file

I need to find the minimum, maximum, and average of values given in a .txt file. I've been able to find the minimum and maximum values but I'm struggling with finding the average of values. I haven't wrote any coding for determining the average as I have no clue where to start. My current code is:
def summaryStats():
filename = input("Enter a file name: ")
file = open(filename)
data = file.readlines()
data = data[0:]
print("The minimum value is " + min(data))
print("The maximum value is " + max(data))
I need to be able to return the average of these values. As of now the .txt document has the following values:
893
255
504
I'm struggling on being able to find the average of these because every way I try to find the sum my result is 0.
Thanks
(sorry I'm just learning to work with files)
You should convert the data retrived from file to integers first, because your data list contains strings not numbers. And after the conversion to integers average can be found easily:
Why conversion to int is required?
>>> '2' > '10' #strings are compared lexicographically
True
Code:
def summaryStats():
filename = input("Enter a file name: ")
with open(filename) as f:
data = [int(line) for line in f]
print("The minimum value is ", min(data))
print("The maximum value is ", max(data))
print("The average value is ", sum(data)/len(data))
Output:
Enter a file name: abc1
The minimum value is 255
The maximum value is 893
The average value is 550.6666666666666
Don't reinvent the wheel, use numpy it takes a couple of instructions. You can import a txt file into a numpy array and then use the built-in function to perform the operations you want:
>>> import numpy as np
>>> data = np.loadtxt('data.txt')
>>> np.average(data)
550.666666667
>>> np.max(data)
893.0
>>> np.min(data)
255.0

Why doesn't this return the average of the column of the CSV file?

def averager(filename):
f=open(filename, "r")
avg=f.readlines()
f.close()
avgr=[]
final=""
x=0
i=0
while i < range(len(avg[0])):
while x < range(len(avg)):
avgr+=str((avg[x[i]]))
x+=1
final+=str((sum(avgr)/(len(avgr))))
clear(avgr)
i+=1
return final
The error I get is:
File "C:\Users\konrad\Desktop\exp\trail3.py", line 11, in averager
avgr+=str((avg[x[i]]))
TypeError: 'int' object has no attribute '__getitem__'
x is just an integer, so you can't index it.
So, this:
x[i]
Should never work. That's what the error is complaining about.
UPDATE
Since you asked for a recommendation on how to simplify your code (in a below comment), here goes:
Assuming your CSV file looks something like:
-9,2,12,90...
1423,1,51,-12...
...
You can read the file in like this:
with open(<filename>, 'r') as file_reader:
file_lines = file_reader.read().split('\n')
Notice that I used .split('\n'). This causes the file's contents to be stored in file_lines as, well, a list of the lines in the file.
So, assuming you want the ith column to be summed, this can easily be done with comprehensions:
ith_col_sum = sum(float(line.split(',')[i]) for line in file_lines if line)
So then to average it all out you could just divide the sum by the number of lines:
average = ith_col_sum / len(file_lines)
Others have pointed out the root cause of your error. Here is a different way to write your method:
def csv_average(filename, column):
""" Returns the average of the values in
column for the csv file """
column_values = []
with open(filename) as f:
reader = csv.reader(f)
for row in reader:
column_values.append(row[column])
return sum(column_values) / len(column_values)
Let's pick through this code:
def averager(filename):
averager as a name is not as clear as it could be. How about averagecsv, for example?
f=open(filename, "r")
avg=f.readlines()
avg is poorly named. It isn't the average of everything! It's a bunch of lines. Call it csvlines for example.
f.close()
avgr=[]
avgr is poorly named. What is it? Names should be meaningful, otherwise why give them?
final=""
x=0
i=0
while i < range(len(avg[0])):
while x < range(len(avg)):
As mentioned in comments, you can replace these with for loops, as in for i in range(len(avg[0])):. This saves you from needing to declare and increment the variable in question.
avgr+=str((avg[x[i]]))
Huh? Let's break this line down.
The poorly named avg is our lines from the csv file.
So, we index into avg by x, okay, that would give us the line number x. But... x[i] is meaningless, since x is an integer, and integers don't support array access. I guess what you're trying to do here is... split the file into rows, then the rows into columns, since it's csv. Right?
So let's ditch the code. You want something like this, using the split http://docs.python.org/2/library/stdtypes.html#str.split function:
totalaverage = 0
for col in range(len(csvlines[0].split(","))):
average = 0
for row in range(len(csvlines)):
average += int(csvlines[row].split(",")[col])
totalaverage += average/len(csvlines)
return totalaverage
BUT wait! There's more! Python has a built in csv parser that is safer than splitting by ,. Check it out here: http://docs.python.org/2/library/csv.html
In response to OP asking how he should go about this in one of the comments, here is my suggestion:
import csv
from collections import defaultdict
with open('numcsv.csv') as f:
reader = csv.reader(f)
numbers = defaultdict(list) #used to avoid so each column starts with a list we can append to
for row in reader:
for column, value in enumerate(row,start=1):
numbers[column].append(float(value)) #convert the value to a float 1. as the number may be a float and 2. when we calc average we need to force float division
#simple comprehension to print the averages: %d = integer, %f = float. items() goes over key,value pairs
print('\n'.join(["Column %d had average of: %f" % (i,sum(column)/(len(column))) for i,column in numbers.items()]))
Producing
>>>
Column 1 had average of: 2.400000
Column 2 had average of: 2.000000
Column 3 had average of: 1.800000
For a file:
1,2,3
1,2,3
3,2,1
3,2,1
4,2,1
Here's two methods. The first one just gets the average for the line (what your code above looks like it's doing). The second gets the average for a column (which is what your question asked)
''' This just gets the avg for a line'''
def averager(filename):
f=open(filename, "r")
avg = f.readlines()
f.close()
count = 0
for i in xrange(len(avg)):
count += len(avg[i])
return count/len(avg)
''' This gets a the avg for all "columns"
char is what we split on , ; | (etc)
'''
def averager2(filename, char):
f=open(filename, "r")
avg = f.readlines()
f.close()
count = 0 # count of items
total = 0 # sum of all the lengths
for i in xrange(len(avg)):
cols = avg[i].split(char)
count += len(cols)
for j in xrange(len(cols)):
total += len(cols[j].strip()) # Remove line endings
return total/float(count)

Categories