Python while loop issues - python

Infile is a genealogy:
holla 1755
ronaj 1781
asdflæj 1803
axle 1823
einar 1855
baelj 1881
æljlas 1903
jobbi 1923
gurri 1955
kolli 1981
Rounaj 2004
I want to print out every generation time from infile and in the end I want the average. Here I think my issue is that line2 gets out of range when the infile ends:
def main():
infile = open('infile.txt', 'r')
line = infile.readline()
tmpstr = line.split('\t')
age=[]
while line !='':
line2 = infile.readline()
tmpstr2 = line2.split('\t')
age.append(int(tmpstr2[1]) - int(tmpstr[1]))
print age
tmpstr = tmpstr2
infile.close()
print sum(age)*1./len(age)
main()
So I decided to read all information to a list but tmpstr doesn´t change value here:
def main():
infile = open('infile.txt', 'r')
line = infile.readline()
age=[]
while line !='':
tmpstr = line.split('\t')
age.append(tmpstr[1])
print age
infile.close()
print sum(age)*1./len(age)
main()
How come? What's wrong with these two scripts? Why am I writing main() two times?
Any ideas how these two can be solved?
Thanx all, this is how it ended up:
def main():
with open('infile.txt', 'r') as input:
ages = []
for line in input:
data = line.split()
age = int(data[1])
ages.append(age)
gentime = []
for i in xrange(len(ages)-1):
print ages[i+1] - ages[i]
gentime.append(ages[i+1] - ages[i])
print 'average gentime is', sum(gentime)*1./len(gentime)
main()

Try this:
def main():
with open('infile.txt', 'r') as input:
ages, n = 0, 0
for line in input:
age = int(line.split()[1])
ages += age
n += 1
print age
print 'average:', float(ages) / n
Some comments:
You don't need to use a list for accumulating the numbers, a couple of local variables are enough
In this case it's a good idea to use split() without arguments, in this way you'll process the input correctly when the name is separated from the number in front of it by spaces or tabs
It's also a good idea to use the with syntax for opening a file and making sure that it gets closed afterwards
With respect to the final part of your question, "Why am I writing main() two times?" that's because the first time you're defining the main function and the second time you're calling it.

You can iterate over the entire contents of the file using this statement:
for line in infile:
# Perform the rest of your steps here
You wouldn't want to use a while loop, unless you had some sort of counter to switch index locations (i.e. you used infile.readlines() and wanted to use a while loop for that).

In the second instance, your code only reads a single line from the file.
Something simpler, like:
age = []
with open('data.txt', 'rt') as f:
for line in f:
vals = line.split('\t')
age.append(int(vals[1]))
print sum(age) / float(len(age))
generates
1878.54545455

You can try something like this:
if __name__ == "__main__":
file = open("infile.txt", "r")
lines = file.readlines()
gens = [int(x.split('\t')[1]) for line in lines]
avg = sum(gens)/len(gens)
The first line is the native entrance for python into a program. It is equivalent to C's "int main()".
Next, its probably easiest to set up for list comprehensions if you read all lines from the file into the list.
The 4th line iterates through the file lines splitting them at the tab and only retrieving the 2nd item (at index 1) from the newly split list.

The problem with both of these scripts is that your while loop is infinite. The condition line != '' will never be false unless the first line is empty.
You could fix this, but it's better to use the Python idiom:
lastyear = None
ages = []
for line in infile:
_name, year = line.split('\t')
year = int(year)
if lastyear:
ages.append(year - lastyear)
lastyear = year
print float(sum(ages))/len(ages)

Related

What am I missing here? Basic file i/o & string.find()

I've got this text file which lists all my movies and I thought I'd put them into a database of sorts. So first steps, read the text file, do some small manipulation and rewrite the file.
So, some lines contain multiple movie names separated by "AKA" so I need to turn that into two separate lines before I write it to the new file.
I have now struck to problems:
Losing the first line in the output file (which I solved by using an additional read/write outside the main loop omitting the check for "AKA" at this stage.
The if statement checking the result of the string.find("AKA") is never triggered.
The code is here:
fd1 = open("Movies_List_2.txt", "r")
fd2 = open("Movies_List_3.txt", "w")
inp_line = fd1.read() # This is to fix the missing first line in the output file
fd2.write(inp_line.strip() + "\n")
for line in fd1:
inp_line = fd1.read()
x = inp_line.find("ÄKA")
if x == -1: # <=== This never triggers
l = len(inp_line)
extn = inp_line[l-3:]
year = inp_line[l-8:l-4]
inp_line2 = inp_line[x+4:l-9]
inp_line = inp_line[:x-1]
fd2.write(inp_line + " " + year + "." + extn + "\n")
fd2.write(inp_line + " " + year + "." + extn + "]n")
else:
fd2.write(inp_line)
fd1.close()
fd2.close()
And a sample of the input file is here:
20 Million Miles to Earth 1957
20,000 Leagues Under the Sea 1954
2001 A Space Odyssey 1968
2010 The Year We Make Contact 1981
2017 AKA Shockwave 2017 <====== This should trigger the test
2036 Origin Unknown 2018
2046 2004
2050 2018
2067 2020
I'm almost certain that there is a fundamental I'm missing here, but I've spent a lot of time over it with no success.
Can someone point out where this code is going wrong?
This
fd1 = open("Movies_List_2.txt", "r")
...
for line in fd1:
inp_line = fd1.read()
...
fd1.close()
looks like you have mixing 2 different ways to read text file in python. These are:
reading everything as once, for example if I want to know number of e letters I can do:
f = open(filename, "r")
content = f.read()
f.close()
print(content.count("e"))
processing file line-by-line, for example if I want to know number of e letters in each line I can do:
f = open(filename, "r")
for line in f:
print(line.count("e"))
f.close()
Note that open might be also used as context manager, which simply speaking does handle closing for you, following is my 1st example using that feature:
with open(filename, "r") as f:
content = f.read()
print(content.count("e"))
Setting aside other dubious code in your script ("ÄKA" and "AKA" are not the same string), your test never triggers because you never loop inside the for.
inp_line = fd1.read() will read the whole file inside inp_line, not just one line! This causes the file descriptor to reach the end of file, so when you try to get the "next" line in:
for line in fd1:
the StopIteration triggers and you never actually iterate.
The solution is to just iterate line-by-line in the for loop:
fd1 = open("Movies_List_2.txt", "r")
fd2 = open("Movies_List_3.txt", "w")
for line in fd1:
x = line.find("ÄKA")
if x == -1: # <=== This never triggers
...
(Note that you need to restructure your loop to use line instead of inp_line.)
I think this is the solution you need.just a little fix.
fd1 = open("Movies_List_2.txt", "r")
fd2 = open("Movies_List_3.txt", "w")
for line in fd1:
line = line.strip()
if line.find("AKA")>0:
print(line.split('AKA'))
line_a,line_b = line.split('AKA')
fd2.write(line_a+"\n")
fd2.write(line_b.strip()+"\n")
else:
fd2.write(line+"\n")
fd1.close()
fd2.close()
How I would do it:
# Forget to close the files.
with open("Movies_List_2.txt") as fdIn:
with open("Movies_List_3.txt", "w") as fdOut:
# The thing...
for line in fdIn: # Read file line by line
line = line.strip()
# Single line movies. Just copy
if 'AKA' not in line:
print(line, file=fdOut)
continue
# Lines with multiple movies
head, year = line[:-5], line[-4:]
for movie in head.split('AKA'): # Split movies
movie = movie.strip()
print(f'{movie} {year}', file=fdOut) # Write move adding the year

I want to replace words from a file by the line no using python i have a list of line no?

if I have a file like:
Flower
Magnet
5001
100
0
and I have a list containing line number, which I have to change.
list =[2,3]
How can I do this using python and the output I expect is:
Flower
Most
Most
100
0
Code that I've tried:
f = open("your_file.txt","r")
line = f.readlines()[2]
print(line)
if line=="5001":
print "yes"
else:
print "no"
but it is not able to match.
i want to overwrite the file which i am reading
You may simply loop through the list of indices that you have to replace in your file (my original answer needlessly looped through all lines in the file):
with open('test.txt') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
print('\n'.join(data))
Output:
Flower
Most
Most
100
0
To overwrite the file you have opened with the replacements, you may use the following:
with open('test.txt', 'r+') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
f.seek(0)
f.write('\n'.join(data))
f.truncate()
The reason that you're having this problem is that when you take a line from a file opened in python, you also get the newline character (\n) at the end. To solve this, you could use the string.strip() function, which will automatically remove these characters.
Eg.
f = open("your_file.txt","r")
line = f.readlines()
lineToCheck = line[2].strip()
if(lineToCheck == "5001"):
print("yes")
else:
print("no")

How do I read a file line by line and print the line that have specific string only in python?

I have a text file containing these lines
wbwubddwo 7::a number1 234 **
/// 45daa;: number2 12
time 3:44
I am trying to print for example if the program find string number1, it will print 234
I start with simple script below but it did not print what I wanted.
with open("test.txt", "rb") as f:
lines = f.read()
word = ["number1", "number2", "time"]
if any(item in lines for item in word):
val1 = lines.split("number1 ", 1)[1]
print val1
This return the following result
234 **
/// 45daa;: number2 12
time 3:44
Then I tried changing f.read() to f.readlines() but this time it did not print out anything.
Does anyone know other way to do this? Eventually I want to get the value for each line for example 234, 12 and 3:44 and store it inside the database.
Thank you for your help. I really appreciate it.
Explanations given below:
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
words = ["number1", "number2", "time"]
for a_line in stripped_lines:
for word in words:
if word in a_line:
number = a_line.split()[1]
print(number)
1) First of all 'rb' gives bytes object i.e something like b'number1 234' would be returned use 'r' to get string object.
2) The lines you read will be something like this and it will be stored in a list.
['number1 234\r\n', 'number2 12\r\n', '\r\n', 'time 3:44']
Notice the \r\n those specify that you have a newline. To remove use strip().
3) Take each line from stripped_lines and take each word from words
and check if that word is present in that line using in.
4)a_line would be number1 234 but we only want the number part. So split()
output of that would be
['number1','234'] and split()[1] would mean the element at index 1. (2nd element).
5) You can also check if the string is a digit using your_string.isdigit()
UPDATE: Since you updated your question and input file this works:
import time
def isTimeFormat(input):
try:
time.strptime(input, '%H:%M')
return True
except ValueError:
return False
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
words = ["number1", "number2", "time"]
for a_line in stripped_lines:
for word in words:
if word in a_line:
number = a_line.split()[-1] if (a_line.split()[-1].isdigit() or isTimeFormat(a_line.split()[-1])) else a_line.split()[-2]
print(number)
why this isTimeFormat() function?
def isTimeFormat(input):
try:
time.strptime(input, '%H:%M')
return True
except ValueError:
To check if 3:44 or 4:55 is time formats. Since you are considering them as values too.
Final output:
234
12
3:44
After some try and error, I found a solution like below. This is based on answer provided by #s_vishnu
with open("test.txt", "r") as f:
lines = f.readlines()
stripped_lines = [line.strip() for line in lines]
for item in stripped_lines:
if "number1" in item:
getval = item.split("actual ")[1].split(" ")[0]
print getval
if "number2" in item:
getval2 = item.split("number2 ")[1].split(" ")[0]
print getval2
if "time" in item:
getval3 = item.split("number3 ")[1].split(" ")[0]
print getval3
output
234
12
3:44
This way, I can also do other things for example saving each data to a database.
I am open to any suggestion to further improve my answer.
You're overthinking this. Assuming you don't have those two asterisks at the end of the first line and you want to print out lines containing a certain value(s), you can just read the file line by line, check if any of the chosen values match and print out the last value (value between a space and the end of the line) - no need to parse/split the whole line at all:
search_values = ["number1", "number2", "time"] # values to search for
with open("test.txt", "r") as f: # open your file
for line in f: # read it it line by line
if any(value in line for value in search_values): # check for search_values in line
print(line[line.rfind(" ") + 1:].rstrip()) # print the last value after space
Which will give you:
234
12
3:44
If you do have asterisks you have to more precisely define your file format as splitting won't necessarily yield you your desired value.

I am learning Python, need some pushing in the right direction

I am trying to learn Python through Coursera, and have some questions about an assignment.
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
My code so far is as follows:
fname = raw_input("Enter file name: ")
f = open(fname)
for line in f:
if not line.startswith("X-DSPAM-Confidence:") : continue
print line
print "Done"
I am a bit confused. Should I store each line I get into a file or variable or something and then extract the floating point values for each?
How should this be tackled in the simplest way since this is just the beginning of the course?
Here is a code snippet that suffices your work. I am reading the float values from the line with "X-DSPAM-Confidence:" and adding them and in the end, I am taking the mean. Also, since you are a beginner, I suggest to keep in mind that when you are dealing with division and you are expecting a float, either numerator or denominator should be float to give the answer in float. Since in the below code snippet, our number is float, we wont have that issue.
fname = raw_input("Enter file name: ")
f = open(fname)
cnt = 0
mean_val = 0
for line in f:
if not line.startswith("X-DSPAM-Confidence:") : continue
mean_val += float(line.split(':')[1])
cnt += 1
f.close()
mean_val /= cnt
print mean_val
The reason not to use a variable named sum is because there is a function with the same name.
The assignment asks you to do the work of the sum() function explicitly. You are on the right track with the if not line but might be more successful if you reverse the logic (like this: if line.startswith ...), then put the handling inside an indented block that follows.
The handling you need is to keep track of how many such lines you handle and the accumulated sum. Use a term that is a synonym for sum that is not already a Python identifier. Extract the float value from the end of line and then <your sum variable> += float(the float from "line").
Don't forget to initialize both counter and accumulator before the loop.
with open(fname) as f:
s = 0
linecount = 0
for line in f:
l = line.split()
try:
num = float(l[1])
except ValueError:
continue
if l[0] == 'X-DSPAM-Confidence:':
s += num
linecount += 1
print(s/linecount)
Here's how I would do it. I'll happily answer any questions.
Using Regular expression.
More info regarding re module, check here !!!
Code:
import re
fname = raw_input("Enter file name: ")
f = open(fname)
val_list = []
tot = 0
line_cnt = 0
for line in f:
a = re.findall("X-DSPAM-Confidence:\s*(\d+\.?\d*)",line)
if len(a) != 0:
tot += float(a[0])
line_cnt +=1
print ("Line Count is ",line_cnt)
print ("Average is ",tot/line_cnt)
f.close()
Content of y.txt:
a
X-DSPAM-Confidence: 0.8475
b
X-DSPAM-Confidence: 0.8476
c
X-DSPAM-Confidence: 0.8477
d
X-DSPAM-Confidence: 0.8478
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
Enter file name: y.txt
Line Count is 4
Average is 0.84765
C:\Users\dinesh_pundkar\Desktop>
Points:
You can open file using with as Patrick has done in his answer. If file is opened using with then no need to close the file explicit.

How do I remove an specific line in a file, based on a list that is formed of inputs?

I have a file with this format:
Frank,456,768,987
Mike,123,456,798
And I'm using this code:
name = input()
age = float(input())
ident = float(input())
phone = float(input())
f = open("Test.txt","r")
lines = f.readlines()
f.close()
f = open("test.txt", "w")
data = [name, age, ident, phone]
for line in lines:
if line!= data:
f.write(line)
So, if the list with the inputs equals a line, that line must be removed. Why is this code not working? The files becomes empty.
One of the problems is that you are comparing a string with a list. The other problem is that you are not closing the file at the end. Here is a version that works:
name = "Frank"
age = 456
ident = 768
phone = 987
f=open("Test.txt","r")
lines=f.readlines()
f.close()
data=",".join(map(str, [name,age,ident,phone]))
with open("test.txt", "w+") as x:
for line in lines:
if line.strip() != data:
x.write(line)
I just hardcoded the values at the beginning for the sake of simplicity. I'm assuming that the file will always have the same correct format. You could also use regex and do some pattern matching. Here I'm making sure that I am converting all the values to string, since join won't accept any integer:
data=",".join(map(str, [name,age,ident,phone]))
You are writing the data into a buffer and you need to flush the text from the buffer to the file and then don't forget to close it.
This should be helpful..
what exactly the python's file.flush() is doing?
Hope it's worked:)
This should work:
f=open("Test.txt","r")
fw=open("test.txt","w")
data= name+","+str(age)+","+str(ident)+","+str(phone)
for line in f:
if not(data in line):
fw.write(line)
f.close()
fw.close()
The problem is that line is a string, and data is a list.
Also, do not convert to float because you wil have to re-convert to string.
Finally, this will also fail because of end-of-line characters at the end of each line.
name=input()
age=input()
ident=input()
phone=input()
with open("Test.txt","r") as f:
lines=f.readlines()
with open("test.txt","w") as f:
data=[name,age,ident,phone]
for line in lines:
if any(l!=d for l,d in zip(line.strip('\n\r').split(','),data)):
f.write(line)
This solution compares two lists, item by item. Another solution would be to build a unique string with ",".join(data) (see other people's answers).
The following code snippet is working properly.
name=input()
age=input()
ident=input()
phone=input()
f=open("Test.txt","r")
fw=open("TestResult.txt","w")
data= name+","+str(age)+","+str(ident)+","+str(phone)
for line in f:
if not(data in line):
fw.write(line)
f.close()
fw.close()
Test.txt input
shovon,23,1628,017
shovo,24,1628,017
shov,25,1628,017
sho,26,1628,017
TestResult.txt output
shovo,24,1628,017
shov,25,1628,017
sho,26,1628,017

Categories