I am learning Python, need some pushing in the right direction - python

I am trying to learn Python through Coursera, and have some questions about an assignment.
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
My code so far is as follows:
fname = raw_input("Enter file name: ")
f = open(fname)
for line in f:
if not line.startswith("X-DSPAM-Confidence:") : continue
print line
print "Done"
I am a bit confused. Should I store each line I get into a file or variable or something and then extract the floating point values for each?
How should this be tackled in the simplest way since this is just the beginning of the course?

Here is a code snippet that suffices your work. I am reading the float values from the line with "X-DSPAM-Confidence:" and adding them and in the end, I am taking the mean. Also, since you are a beginner, I suggest to keep in mind that when you are dealing with division and you are expecting a float, either numerator or denominator should be float to give the answer in float. Since in the below code snippet, our number is float, we wont have that issue.
fname = raw_input("Enter file name: ")
f = open(fname)
cnt = 0
mean_val = 0
for line in f:
if not line.startswith("X-DSPAM-Confidence:") : continue
mean_val += float(line.split(':')[1])
cnt += 1
f.close()
mean_val /= cnt
print mean_val

The reason not to use a variable named sum is because there is a function with the same name.
The assignment asks you to do the work of the sum() function explicitly. You are on the right track with the if not line but might be more successful if you reverse the logic (like this: if line.startswith ...), then put the handling inside an indented block that follows.
The handling you need is to keep track of how many such lines you handle and the accumulated sum. Use a term that is a synonym for sum that is not already a Python identifier. Extract the float value from the end of line and then <your sum variable> += float(the float from "line").
Don't forget to initialize both counter and accumulator before the loop.

with open(fname) as f:
s = 0
linecount = 0
for line in f:
l = line.split()
try:
num = float(l[1])
except ValueError:
continue
if l[0] == 'X-DSPAM-Confidence:':
s += num
linecount += 1
print(s/linecount)
Here's how I would do it. I'll happily answer any questions.

Using Regular expression.
More info regarding re module, check here !!!
Code:
import re
fname = raw_input("Enter file name: ")
f = open(fname)
val_list = []
tot = 0
line_cnt = 0
for line in f:
a = re.findall("X-DSPAM-Confidence:\s*(\d+\.?\d*)",line)
if len(a) != 0:
tot += float(a[0])
line_cnt +=1
print ("Line Count is ",line_cnt)
print ("Average is ",tot/line_cnt)
f.close()
Content of y.txt:
a
X-DSPAM-Confidence: 0.8475
b
X-DSPAM-Confidence: 0.8476
c
X-DSPAM-Confidence: 0.8477
d
X-DSPAM-Confidence: 0.8478
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
Enter file name: y.txt
Line Count is 4
Average is 0.84765
C:\Users\dinesh_pundkar\Desktop>
Points:
You can open file using with as Patrick has done in his answer. If file is opened using with then no need to close the file explicit.

Related

Calculating with data from a txt file in python

Detailed:
I have a set of 200 or so values in a txt file and I want to select the first value b[0] and then go through the list from [1] to [199] and add them together.
So, [0]+[1]
if that's not equal to a certain number, then it would go to the next term i.e. [0]+[2] etc etc until it's gone through every term. Once it's done that it will increase b[0] to b[1] and then goes through all the values again
Step by step:
Select first number in list.
Add that number to the next number
Check if that equals a number
If it doesn't, go to next term and add to first term
Iterate through these until you've gone through all terms/ found
a value which adds to target value
If gone through all values, then go to the next term for the
starting add value and continue
I couldn't get it to work, if anyone can maybe provide a solution or give some advice? Much appreciated. I've tried looking at videos and other stack overflow problems but I still didn't get anywhere. Maybe I missed something, let me know! Thank you! :)
I've attempted it but gotten stuck. This is my code so far:
b = open("data.txt", "r")
data_file = open("data.txt", "r")
for i, line in enumerate(data_file):
if (i+b)>2020 or (i+b)<2020:
b=b+1
else:
print(i+b)
print(i*b)
Error:
Traceback (most recent call last):
File "c:\Users\███\Desktop\ch1.py", line 11, in <module>
if (i+b)>2020 or (i+b)<2020:
TypeError: unsupported operand type(s) for +: 'int' and '_io.TextIOWrapper'
PS C:\Users\███\Desktop>
I would read the file into an array and then convert it into ints before
actually dealing with the problem. files are messy and the less we have to deal with them the better
with open("data.txt", "r") as data_file:
lines = data_file.readlines() # reads the file into an array
data_file.close
j = 0 # you could use a better much more terse solution but this is easy to understand
for i in lines:
lines[j] = int(i.strip().replace("\n", ""))
j += 1
i, j = 0
for i in lines: # for every value of i we go through every value of j
# so it would do x = [0] + [0] , [0] + [1] ... [1] + [0] .....
for j in lines:
x = j + i
if x == 2020:
print(i * j)
Here are some things that you can fix.
You can't add the file object b to the integer i. You have to convert the lines to int by using something like:
integer_in_line = int(line.strip())
Also you have opened the same file twice in read mode with:
b = open("data.txt", "r")
data_file = open("data.txt", "r")
Opening it once is enough.
Make sure that you close the file after you used it:
data_file.close()
To compare each number in the list with each other number in the list you'll need to use a double for loop. Maybe this works for you:
certain_number = 2020
data_file = open("data.txt", "r")
ints = [int(line.strip()) for line in data_file] # make a list of all integers in the file
for i, number_at_i in enumerate(ints): # loop over every integer in the list
for j, number_at_j in enumerate(ints): # loop over every integer in the list
if number_at_i + number_at_j == certain_number: # compare the integers to your certain number
print(f"{number_at_i} + {number_at_j} = {certain_number}")
data_file.close()
Your problem is the following: The variables b and data_file are not actually the text that you are hoping they are. You should read something about reading text files in python, there are many tutorials on that.
When you call open("path.txt", "r"), the open function returns a file object, not your text. If you want the text from the file, you should either call read or readlines. Also it is important to close your file after reading the content.
data_file = open("data.txt", "r") # b is a file object
text = data_file.read() # text is the actual text in the file in a single string
data_file.close()
Alternatively, you could also read the text into a list of strings, where each string represents one line in the file: lines = data_file.readlines().
I assume that your "data.txt" file contains one number per line, is that correct? In that case, your lines variable will be a list of the numbers, but they will be strings, not integers or floats. Therefore, you can't simply use them to perform calculation. You would need to call int() on them.
Here is an example how to do it. I assumed that your textfile looks like this (with arbitary numbers):
1
2
3
4
...
file = open("data.txt", "r")
lines = file.readlines()
file.close()
# This creates a new list where the numbers are actual numbers and not strings
numbers = []
for line in lines:
numbers.append(int(line))
target_number = 2020
starting_index = 0
found = False
for i in range(starting_index, len(numbers)):
temp = numbers[i]
for j in range(i + 1, len(numbers)):
temp += numbers[j]
if temp == target_number:
print(f'Target number reached by adding nubmers from {i} to {j}')
found = True
break #This stops the inner loop.
if found:
break #This stops the outer loop

Find the line number a string is on in an external text file

I am trying to create a program where it gets input from a string entered by the user and searches for that string in a text file and prints out the line number. If the string is not in the text file, it will print that out. How would I do this? Also I am not sure if even the for loop that I have so far would work for this so any suggestions / help would be great :).
What I have so far:
file = open('test.txt', 'r')
string = input("Enter string to search")
for string in file:
print("") #print the line number
You can implement this algorithm:
Initialize a counter
Read lines one by one
If the line matches the target, return the current count
Increment the count
If reached the end without returning, the line is not in the file
For example:
def find_line(path, target):
with open(path) as fh:
count = 1
for line in fh:
if line.strip() == target:
return count
count += 1
return 0
A text file differs from memory used in programs (such as dictionaries and arrays) in the manner that it is sequential. Much like the old tapes used for storage a long, long time ago, there's no way to grab/find a specific line without combing through all prior lines (or somehow guessing the exact memory location). Your best option is just to create a for loop that iterates through each line until it finds the one it's looking for, returning the amount of lines traversed until that point.
file = open('test.txt', 'r')
string = input("Enter string to search")
lineCount = 0
for line in file:
lineCount += 1
if string == line.rstrip(): # remove trailing newline
print(lineCount)
break
filepath = 'test.txt'
substring = "aaa"
with open(filepath) as fp:
line = fp.readline()
cnt = 1
flag = False
while line:
if substring in line:
print("string found in line {}".format(cnt))
flag = True
break
line = fp.readline()
cnt += 1
if not flag:
print("string not found in file")
If the string will match a line exactly, we can do this in one-line:
print(open('test.txt').read().split("\n").index(input("Enter string to search")))
Well the above kind of works accept it won't print "no match" if there isn't one. For that, we can just add a little try:
try:
print(open('test.txt').read().split("\n").index(input("Enter string to search")))
except ValueError:
print("no match")
Otherwise, if the string is just somewhere in one of the lines, we can do:
string = input("Enter string to search")
for i, l in enumerate(open('test.txt').read().split("\n")):
if string in l:
print("Line number", i)
break
else:
print("no match")

I can't properly define a function in Python. Can anyone tell me where I am going wrong?

The directions are to:
Write a function called calc_average that takes a string representing a filename of the following format, for example:
Smith 82
Jones 75
Washington 91
The function should calculate and return the class average from the data in the file.
So far I have:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
I have tried to work past this part, but I can't find the right way to do the average. Any suggestions? I only need the function and the file is only an example.
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
After all that, you're just literally sending something into the function, but not asking for anything to come out.
Using return will help get something out of the function.
return [variable] is a way to use it.
Here:
Add this line
return [variable] to the end of your code, such that it looks like this:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
return variable #where you replace variable with
#the thing you want to get out of your function
To call this function (or should i say "run" it) just write the name of it, but dedented.
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
return variable
calc_average() #<- this calls the function
You might also want to read up on parameters:
parametere are values passed into a function and are used.
Example:
def test1(number):
number = number + 1 #this can be written as number += 1 as well
return number
x = test1(5)
First I define the function with a number parameter. This would mean that number will be used in this function. Notice how the lines below def test1(number) also use the variable number. whatever is passed into the function as number will be considered number in the function.
Then, I call the function and use 5 as a parameter.
When it's called, the function takes 5 (since that was the input parameter) and stores the variable number as 5.(from def test1(number)) Thus, It's like writing number = 5 in the function itself.
Afterwards, return number will take the number (which in this case is added to become 6, number = 6) and give it back to the outside code. Thus, it's like saying return 6.
Now back to the bottom few lines. x = test1(5) will make x = 6, since the function returned 6.
Hope I helped you understand functions more.
The function needs an argument. It also needs to return the average, so there should be a return statement at the end.
def calc_average(file_name):
...
return <something>
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
clsavg = 0
counter = 0
for line in readlines:
parts = line.split()
clsavg = clsavg+ float(parts[1])
counter = counter + 1
print clsavg/counter
To continue using mostly your own code and your example, you could do:
def calc_average(filename):
infile = open(filename, "r")
readlines = infile.readlines()
average = 0 # use this to sum all grades
index = 0 # keep track of how many records or lines in our txt
for line in readlines:
parts = line.split()
name = parts[0] # basically useless
# for each line in txt we sum all grades
average = average + float(parts[1]) # need to convert this string to a float
index = index + 1
# divide by the number of entries in your txt file to get the class average
return average / index
then:
calc_average('grades.txt')
prints:
82.66666666666667
Alright, let's look at this, part by part:
Write a function called calc_average
def calc_average():
that takes a string representing a filename
let's make this a meaningful variable name:
def calc_average(filename):
Now that we have the basics down, let's talk about how to actually solve your problem:
Each line in your file contains a name and a grade. You want to keep track of the grades so that you can compute the average out of them. So, you need to be able to:
read a file one line at a time
split a line and take the relevant part
compute the average of the relevant parts
So, it seems that holding the relevant parts in a list would be helpful. We would then need a function that computes the average of a list of numbers. So let's write that
def average(L):
sum = 0
for num in L:
sum += num
return sum/len(L)
Of course, there's an easier way to write this:
def average(L):
return sum(L)/len(L) # python has a built-in sum function to compute the sum of a list
Now that we have a function to compute the average, let's read the file and create a list of numbers whose average we want to compute:
def read_from_file(filename):
answer = [] # a list of all the numbers in the file
with open(filename) as infile: # open the file to read
for line in infile:
parts = line.split()
grade = int(parts[-1]) # the integer value of the last entity in that line
answer.append(grade)
return answer
Now that we have a function that returns the relevant information from a file, and a function that computes averages, we just have to use the two together:
def calc_average(filename):
numbers = read_from_file(filename)
answer = average(numbers)
return answer
Now, you might notice that you don't need to keep track of each number, as you just sum them up and divide by the number of numbers. So, this can be done more concisely as follows:
def calc_average(filename):
nums = 0
total = 0
with open(filename) as infile:
for line in infile:
total += int(line.split()[-1])
nums += 1
return total/nums
You didnt calculated the average.Also didnt return anything from the function calc_average.So try this
def calc_average():
with open('filename.txt') as text:
nums = [int(i.split()[1]) for i in text]
avg = float(sum(nums)) / float(len(nums))
return avg
>>>print(calc_average())
82.6666666667
First thing you're doing wrong is not marking the source code in your post, as source code. Put it on separate lines to the rest of your post, and use the {} link at the top of the editor to mark it as source code. Then it should come out like this:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
You should do the same with the file contents: I am assuming that you have one name and one number per line.
If you want to put a snippet of code inline with your text, e.g. "the foo() function", put a backtick each side of the code. The backtick is like an accent grave, and is sometimes very wrongly used as an opening quote char in text files.
Next, you were to write a function that takes a string containing a filename. But you have
def calc_average():
infile = open("filename.txt", "r")
That doesn't take anything. How about
def calc_average(filename):
infile = open(filename, "r")
Now, what your function is doing is, reading the lines ok, splitting them into a name and a number -- but both are still strings, and the string containing the number is put in the variable clsavg, and then just setting the variable average to 0 every time a line is read.
But what do you want to do? I think when you say "class average", these are all people in a class, the numbers are their scores, and you want to calculate the average of the numbers? So that means adding up all the numbers, and dividing by the number of rows in the file.
So you need to set a variable to 0 ONCE, before the loop starts, then on each time you read a line, increment it by the number value. I would imagine the clsavg variable would be the one to use. So you need to convert parts[1] to an integer. You can use int() for that. Then you need to increment it, with += or with a statement like x = x + y. Ask google if you want more details. That way you build up a total value of all the numbers. Finally, after the loop is finished (meaning on a line that is only indented as far as the for), you need to divide the total by the number of rows. That would be the number of elements in readlines. You should google the len() function. Division uses the / operator. You can use x /= y to set x to the value of x/y.
That's making lots of assumptions: that you want an integer average, that every line in the file has the name and number (no blank lines or comments etc.) By the way, you can use float() instead of int() if you want more precision.

Adding Up Numbers From A List In Python

basically i'm trying to complete a read file. i have made the "make" file that will generate 10 random numbers and write it to a text file. here's what i have so far for the "read" file...
def main():
infile = open('mynumbers.txt', 'r')
nums = []
line = infile.readline()
print ('The random numbers were:')
while line:
nums.append(int(line))
print (line)
line = infile.readline()
total = sum(line)
print ('The total of the random numbers is:', total)
main()
i know it's incomplete, i'm still a beginner at this and this is my first introduction to computer programming or python. basically i have to use a loop to gather up the sum of all the numbers that were listed in the mynumbers.txt. any help would be GREATLY appreciated. this has been driving me up a wall.
You don't need to iterate manually in Python (this isn't C, after all):
nums = []
with open("mynumbers.txt") as infile:
for line in infile:
nums.append(int(line))
Now you just have to take the sum, but of nums, of course, not of line:
total = sum(nums)
The usual one-liner:
total = sum(map(int, open("mynumbers.txt")))
It does generate a list of integers (albeit very temporarily).
Although I would go with Tim's answer above, here's another way if you want to use readlines method
# Open a file
infile = open('mynumbers.txt', 'r')
sum = 0
lines = infile.readlines()
for num in lines:
sum += int(num)
print sum
Just another solution... :-)
with open("x.txt") as file:
total = sum(int(line) for line in file)
This solution sums the "results" of a generator object so it isn't memory intensive yet short and elegant (pythonic).

Python while loop issues

Infile is a genealogy:
holla 1755
ronaj 1781
asdflæj 1803
axle 1823
einar 1855
baelj 1881
æljlas 1903
jobbi 1923
gurri 1955
kolli 1981
Rounaj 2004
I want to print out every generation time from infile and in the end I want the average. Here I think my issue is that line2 gets out of range when the infile ends:
def main():
infile = open('infile.txt', 'r')
line = infile.readline()
tmpstr = line.split('\t')
age=[]
while line !='':
line2 = infile.readline()
tmpstr2 = line2.split('\t')
age.append(int(tmpstr2[1]) - int(tmpstr[1]))
print age
tmpstr = tmpstr2
infile.close()
print sum(age)*1./len(age)
main()
So I decided to read all information to a list but tmpstr doesn´t change value here:
def main():
infile = open('infile.txt', 'r')
line = infile.readline()
age=[]
while line !='':
tmpstr = line.split('\t')
age.append(tmpstr[1])
print age
infile.close()
print sum(age)*1./len(age)
main()
How come? What's wrong with these two scripts? Why am I writing main() two times?
Any ideas how these two can be solved?
Thanx all, this is how it ended up:
def main():
with open('infile.txt', 'r') as input:
ages = []
for line in input:
data = line.split()
age = int(data[1])
ages.append(age)
gentime = []
for i in xrange(len(ages)-1):
print ages[i+1] - ages[i]
gentime.append(ages[i+1] - ages[i])
print 'average gentime is', sum(gentime)*1./len(gentime)
main()
Try this:
def main():
with open('infile.txt', 'r') as input:
ages, n = 0, 0
for line in input:
age = int(line.split()[1])
ages += age
n += 1
print age
print 'average:', float(ages) / n
Some comments:
You don't need to use a list for accumulating the numbers, a couple of local variables are enough
In this case it's a good idea to use split() without arguments, in this way you'll process the input correctly when the name is separated from the number in front of it by spaces or tabs
It's also a good idea to use the with syntax for opening a file and making sure that it gets closed afterwards
With respect to the final part of your question, "Why am I writing main() two times?" that's because the first time you're defining the main function and the second time you're calling it.
You can iterate over the entire contents of the file using this statement:
for line in infile:
# Perform the rest of your steps here
You wouldn't want to use a while loop, unless you had some sort of counter to switch index locations (i.e. you used infile.readlines() and wanted to use a while loop for that).
In the second instance, your code only reads a single line from the file.
Something simpler, like:
age = []
with open('data.txt', 'rt') as f:
for line in f:
vals = line.split('\t')
age.append(int(vals[1]))
print sum(age) / float(len(age))
generates
1878.54545455
You can try something like this:
if __name__ == "__main__":
file = open("infile.txt", "r")
lines = file.readlines()
gens = [int(x.split('\t')[1]) for line in lines]
avg = sum(gens)/len(gens)
The first line is the native entrance for python into a program. It is equivalent to C's "int main()".
Next, its probably easiest to set up for list comprehensions if you read all lines from the file into the list.
The 4th line iterates through the file lines splitting them at the tab and only retrieving the 2nd item (at index 1) from the newly split list.
The problem with both of these scripts is that your while loop is infinite. The condition line != '' will never be false unless the first line is empty.
You could fix this, but it's better to use the Python idiom:
lastyear = None
ages = []
for line in infile:
_name, year = line.split('\t')
year = int(year)
if lastyear:
ages.append(year - lastyear)
lastyear = year
print float(sum(ages))/len(ages)

Categories