Using the split() or find() function in python

Using the split() or find() function in python - python

Am writing a program that opens a file and looks for line which are like this:
X-DSPAM-Confidence: 0.8475.
I want to use the split and find function to extract these lines and put it in a variable. This is the code I have written:
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname,'r')
total = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:"): continue
Please, Please I am now beginning in python so please give me something simple which I can understand to help me later on. Please, Please.

I think the only wrong part is not in if :
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname,'r')
total = 0
lines = []
for line in fh:
if line.startswith("X-DSPAM-Confidence:"):
lines.append(line)

First receive the input with raw_input()
fname = raw_input("Enter file name: ")
Then check if the input string is empty:
if not fname:
fname = 'mbox-short.txt'
Then, open the file and read it line by line:
lines = []
with open(fname, 'r') as f:
for line in f.readlines():
if line.startswith("X-DSPAM-Confidence:"):
lines.append(line)
The with open() as file statement just ensures that the file object gets closed when you don't need it anymore. (file.close() is called automatically upon exiting out of the with clause)

I know where this one is coming from as I've done it myself some time ago. As far as I remember you need to calculate the average :)
fname = raw_input("Enter file name: ")
fh = open(fname)
count = 0
sum = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count = count + 1
pos = line.find(' ')
sum = sum + float(line[pos:])
average = sum/count

You're very close, you just need to add a statement below the continue adding the line to a list.
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname,'r')
total = 0
lines = []
for line in fh:
if not line.startswith("X-DSPAM-Confidence:"):
continue
lines.append(line) # will only execute if the continue is not executed
fh.close()
You should also look at the with keyword for opening files - it's much safer and easier. You would use it like this (I also swapped the logic of your if - saves you a line and a needless continue):
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
total = 0
good_lines = []
with open(fname,'r') as fh:
for line in fh:
if line.startswith("X-DSPAM-Confidence:"):
good_lines.append(line)
If you just want the values, you can do a list comprehension with the good_lines list like this:
values = [ l.split()[1] for l in good_lines ]

Related

Python incorrect indent for for-loop (Coursera Python Data Structure courses)

Why does my code print the empty list?
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
lst = []
for line in fh:
line = line.rstrip()
word = line.split()
if len(word) < 0:
countinue
print(word[1])
the text file can be downloaded here

There are two issues that I found:
Be careful with the indent, which results the empty print
The space is matter to find the correct begging place, I accidentally missed the space so that produce the duplicated or repeated result.
#Assignment 8.5
#file name = mbox-short.txt
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
line = line.rstrip()
if not line.startswith('From '): #To check if the line staty with 'From '
continue #Note that there is a space behind the From, otherwise the print resuly would duplicated
word = line.split()
count = count + 1
print(word[1]) #be careful with the indent
print("There were", count, "lines in the file with From as the first word")

Outputting email address from text files in Python, looking to make code a little more elegant

I'm working on a Python course on Coursera. One of the assignments is to read input from a text file, extract the email addresses on lines starting with "From:", and then print both email addresses and the number of lines that start with "From:". I got it to work after a little effort, but I wanted to see if my code can be cleaned up.
If I shouldn't worry so much about writing elegant code if I'm just in my second week of Python, you can go ahead and let me know.
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
text = fh.read()
lines = text.splitlines()
count = 0
from_lines = list()
for line in lines:
if 'From:' in line:
count += 1
from_lines.append(line)
email = list()
for line in from_lines:
email = line.split()
print(email[1])
print("There were", count, "lines in the file with From as the first word")

You can get the work done with only one loop like this:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
lines = text.readlines() #readlines creates a list of lines from the file
count = 0
for line in lines:
if 'From:' in line:
count += 1
email = line.split()[1]
print(email)
fh.close() # always close the file after use
print("There were {} lines in the file with From as the first word".format(count))

You never close the file. Anyway you shouldn't be handling the files manually, and instead use a context manager:
with open(fname) as as fh:
...
If you are going to iterate the file line-by-line, there is no need to save to memory the whole content of the file to a string. Files are iterators themselves and iterating over them gives the lines:
for line in fh:
There is no need for two loops - if you just found there is an email in a line, why save it for later? Just use it right there!
You might check if the line startswith instead of using in.
All together might give:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
count = 0
with open(fname) as fh:
for line in fh:
if line.startswith('From:'):
count += 1
email = line.split()[1]
print(email)
print("There were", count, "lines in the file with From as the first word")

Here:
fname = input("Enter file name: ")
if not(fname): fname = "mbox-short.txt"
with open(fname) as fh:
lines = fh.readlines()
count = 0
from_lines = [line for line in lines if line.startswith('From:')]
for line in from_lines:
count += 1
email = line.split()
print(email[1])
print("There were",count,"lines in the file with From as the first word")
Or...
fname = input("Enter file name: ")
if not(fname): fname = "mbox-short.txt"
with open(fname) as fh:
lines = fh.readlines()
from_lines = [line for line in lines if line.startswith('From:')]
for line in from_lines: print(line.split()[1])
print(f"There were {len(from_lines)} lines in the file with From as the first word")

Reading >2000 lines text file, but it stops at line 46 that is empty. Why?

This problem only occurs when I include the print line I commented out below.
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
i = 0
count = 0
with open(fname, 'r') as fh:
for line in fh:
temp = line.split()
#print(temp[0])
count+=1
print(count)

When you attempt to split an empty string it returns an empty list:
>>> ''.split()
[]
For this reason, attempting to access temp[0] throws an IndexError exception and your processing stops. You could fix it like so:
if not line: # line is blank
continue

When a line is empty, temp is also empty. There is no temp[0] to print, and Python terminates with an uncaught IndexError.

counting the lines and extract the floating point values and compute the average of the values

So i need to write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:X-DSPAM-Confidence: 0.8475
I am stuck in getting the sum of the extracted values and counting the lines and printing to show the user.
out_number = 'X-DSPAM-Confidence: 0.8475'
Num = 0.0
flag = 0
fileList = list()
fname = input('Enter the file name')
try:
fhand = open(fname)
except:
print('file cannot be opened:',fname)
for line in fhand:
fileList = line.split()
print(fileList)
for line in fileList:
if flag == 0:
pos = out_number.find(':')
Num = out_number[pos + 2:]
print (float(Num))

You have an example line in your code, and when you look through each line in your file, you compute the number in your example line, not in the line from the file.
So, here's what I would do:
import os
import sys
fname = input('Enter the file name: ')
if not os.path.isfile(fname):
print('file cannot be opened:', fname)
sys.exit(1)
prefix = 'X-DSPAM-Confidence: '
numbers = []
with open(fname) as infile:
for line in infile:
if not line.startswith(prefix): continue
num = float(line.split(":",1)[1])
print("found:", num)
numbers.append(num)
# now, `numbers` contains all the floating point numbers from the file
average = sum(numbers)/len(numbers)
But we can make it more efficient:
import os
import sys
fname = input('Enter the file name: ')
if not os.path.isfile(fname):
print('file cannot be opened:', fname)
sys.exit(1)
prefix = 'X-DSPAM-Confidence: '
tot = 0
count = 0
with open(fname) as infile:
for line in infile:
if not line.startswith(prefix): continue
num = line.split(":",1)[1]
tot += num
count += 1
print("The average is:", tot/count)

try this
import re
pattern = re.compile("X-DSPAM-Confidence:\s(\d+.\d+)")
sum = 0.0
count = 0
fPath = input("file path: ")
with open('fPath', 'r') as f:
for line in f:
match = pattern.match(line)
if match is not None:
lineValue = match.group(1)
sum += float(lineValue)
count += 1
print ("The average is:", sum /count)

fname = input("Enter file name: ")
fh = open(fname)
count=0
x=0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
x=float(line.split(":")[1].rstrip())+x
count=count+1
output=x/count
print("Average spam confidence:",output)

Python programming error re: reading from files

I'm taking an online class and we were assigned the following task:
"Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below.
You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt when you are testing below enter mbox-short.txt as the file name."
The desired output is: "Average spam confidence: 0.750718518519"
Here is the code I've written:
fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read()
for line in inp:
if not line.strip().startswith("X-DSPAM-Confidence: 0.8475") : continue
pos = line.find(':')
num = float(line[pos+1:])
total = float(num)
count = float(total + 1)
print 'Average spam confidence: ', float( total / count )
The output I get is: "Average spam confidence: nan"
What am I missing?

values = []
#fname = raw_input("Enter file name: ")
fname = "mbox-short.txt"
with open(fname, 'r') as fh:
for line in fh.read().split('\n'): #creating a list of lines
if line.startswith('X-DSPAM-Confidence:'):
values.append(line.replace('X-DSPAM-Confidence: ', '')) # I don't know whats after the float value
values = [float(i) for i in values] # need to convert the string to floats
print 'Average spam confidence: %f' % float( sum(values) / len(values))
I just tested this against the sample data it works just fine

#try the code below, it is working.
fname = raw_input("Enter file name: ")
count=0
value = 0
sum=0
fh = open(fname)
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
pos = line.find(':')
num = float(line[pos+1:])
sum=sum+num
count = count+1
print "Average spam confidence:", sum/count

My guess from the question is that the actual 0.8475 is actually just an example, and you should be finding all the X-DSPAM-Confidence: lines and reading those numbers.
Also, the indenting on the code you added has all the calcuations outside the for loop, I'm hoping that is just a formatting error for the upload, otherwise that would also be a problem.
As a matter if simplification you can also skip the
inp = fh.read()
line and just do
for line in fh:
Another thing to look at is that total will always only be the last number you read.

# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
count = 0
total = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count = count + 1
# print count
num = float(line[20:])
total +=num
# print total
average = total/count
print "Average spam confidence:", average

The way you're checking if it is the correct field is too specific. You need to look for the field title without a value (see code below). Also your counting and totaling needs to happen within the loop. Here is a simpler solution that makes use of python's built in functions. Using a list like this takes a little bit more space but makes the code easier to read in my opinion.
How about this? :D
with open(raw_input("Enter file name: ")) as f:
values = [float(line.split(":")[1]) for line in f.readlines() if line.strip().startswith("X-DSPAM-Confidence")]
print 'Average spam confidence: %f' % (sum(values)/len(values))
My output:
Average spam confidence: 0.750719
If you need more precision on that float: Convert floating point number to certain precision, then copy to String
Edit: Since you're new to python that may be a little too pythonic :P Here is the same code expanded out a little bit:
fname = raw_input("Enter file name: ")
values = []
with open(fname) as f:
for line in f.readlines():
if line.strip().startswith("X-DSPAM-Confidence"):
values.append(float(line.split(":")[1]))
print 'Average spam confidence: %f' % (sum(values)/len(values))

fname = raw_input("Enter file name: ")
fh = open(fname)
x_count = 0
total_count = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
line = line.strip()
x_count = x_count + 1
num = float(line[21:])
total_count = num + total_count
aver = total_count / x_count
print "average spam confidence:", aver

user_data = raw_input("Enter the file name: ")
lines_list = [line.strip("\n") for line in open(user_data, 'r')]
def find_spam_confidence(data):
confidence_sum = 0
confidence_count = 0
for line in lines_list:
if line.find("X-DSPAM-Confidence") == -1:
pass
else:
confidence_index = line.find(" ") + 1
confidence = float(line[confidence_index:])
confidence_sum += confidence
confidence_count += 1
print "Average spam confidence:", str(confidence_sum / confidence_count)
find_spam_confidence(lines_list)

fname = raw_input("Enter file name: ")
fh = open(fname)
c = 0
t = 0
for line in fh:
if line.startswith("X-DSPAM-Confidence:") :
c = c + 1
p = line.find(':')
n = float(line[p+1:])
t = t + n
print "Average spam confidence:", t/c

fname = input("Enter file name: ")
fh = open(fname)
count = 0
add = 0
for line in fh:
if line.startswith("X-DSPAM-Confidence:"):
count = count+1
pos = float(line[20:])
add = add+pos
print("Average spam confidence:", sum/count)

fname = input('Enter the file name : ') # file name is mbox-short.txt
try:
fopen = open(fname,'r') # open the file to read through it
except:
print('Wrong file name') #if user input wrong file name display 'Wrong file name'
quit()
count = 0 # variable for number of 'X-DSPAM-Confidence:' lines
total = 0 # variable for the sum of the floating numbers
for line in fopen: # start the loop to go through file line by line
if line.startswith('X-DSPAM-Confidence:'): # check whether a line starts with 'X-DSPAM-Confidence:'
count = count + 1 # counting total no of lines starts with 'X-DSPAM-Confidence:'
strip = line.strip() # remove whitespace between selected lines
nline = strip.find(':') #find out where is ':' in selected line
wstring = strip[nline+2:] # extract the string decimal value
fstring = float(wstring) # convert decimal value to float
total = total + fstring # add the whole float values and put sum in to variable named 'total'
print('Average spam confidence:',total/count) # printout the average value

total = float(num)
You forgot here to sum the num floats.
It should have been
total = total+num

fname = input("Enter file name: ")
fh = open(fname)
count=0
avg=0
cal=0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") :
continue
else:
count=count+1
pos = line.find(':')
num=float(line[pos+1:])
cal=float(cal+num)
#print cal,count
avg=float(cal/count)
print ("Average spam confidence:",avg)

IT WORKS JUST FINE !!!
Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname)
count = 0
tot = 0
ans = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count = count + 1
num = float(line[21:])
tot = num + tot
ans = tot / count
print("Average spam confidence:", ans)

# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname,'r')
count=0
avg=0.0
cal=0.00
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") :
continue
else:
count=count+1
pos = line.find(':')
num=float(line[pos+1:])
cal=cal+num
#print cal,count
avg=float(cal/count)
print "Average spam confidence:",avg

fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read()
i = 0
total = 0
count = 0
for line in inp:
if not line.strip().startswith("X-DSPAM-Confidence: 0.8475"):
continue
pos = line.find(':')
num = float(line[pos+1:])
total += num
count += 1
print 'Average spam confidence: ', float( total / count )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using the split() or find() function in python - python

I think the only wrong part is not in if : fname = raw_input("Enter file name: ") if len(fname) == 0: fname = 'mbox-short.txt' fh = open(fname,'r') total = 0 lines = [] for line in fh: if line.startswith("X-DSPAM-Confidence:"): lines.append(line)

Related

Python incorrect indent for for-loop (Coursera Python Data Structure courses)

Outputting email address from text files in Python, looking to make code a little more elegant

Reading >2000 lines text file, but it stops at line 46 that is empty. Why?

counting the lines and extract the floating point values and compute the average of the values

Python programming error re: reading from files

Categories

Resources