I'm a beginner with Python.
I tried to solve the problem: "If we have a file containing <1000 lines, how to print only the odd-numbered lines? ". That's my code:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
n=1
num_lines=sum(1 for line in f)
while n<num_lines:
if n/2!=0:
a=f.readlines()[n]
print(a)
break
n=n+2
where n is a counter and num_lines calculates how many lines the file contains.
But when I try to execute the code, it says:
"a=f.readlines()[n]
IndexError: list index out of range"
Why it doesn't recognize n as a counter?
You have the call to readlines into a loop, but this is not its intended use,
because readlines ingests the whole of the file at once, returning you a LIST
of newline terminated strings.
You may want to save such a list and operate on it
list_of_lines = open(filename).readlines() # no need for closing, python will do it for you
odd = 1
for line in list_of_lines:
if odd : print(line, end='')
odd = 1-odd
Two remarks:
odd is alternating between 1 (hence true when argument of an if) or 0 (hence false when argument of an if),
the optional argument end='' to the print function is required because each line in list_of_lines is terminated by a new line character, if you omit the optional argument the print function will output a SECOND new line character at the end of each line.
Coming back to your code, you can fix its behavior using a
f.seek(0)
before the loop to rewind the file to its beginning position and using the
f.readline() (look, it's NOT readline**S**) method inside the loop,
but rest assured that proceding like this is. let's say, a bit unconventional...
Eventually, it is possible to do everything you want with a one-liner
print(''.join(open(filename).readlines()[::2]))
that uses the slice notation for lists and the string method .join()
Well, I'd personally do it like this:
def print_odd_lines(some_file):
with open(some_file) as my_file:
for index, each_line in enumerate(my_file): # keep track of the index of each line
if index % 2 == 1: # check if index is odd
print(each_line) # if it does, print it
if __name__ == '__main__':
print_odd_lines('C:\Users\Savina\Desktop\rosalind_ini5.txt')
Be aware that this will leave a blank line instead of the even number. I'm sure you figure how to get rid of it.
This code will do exactly as you asked:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
for i, line in enumerate(f.readlines()): # Iterate over each line and add an index (i) to it.
if i % 2 == 0: # i starts at 0 in python, so if i is even, the line is odd
print(line)
To explain what happens in your code:
A file can only be read through once. After that is has to be closed and reopened again.
You first iterate over the entire file in num_lines=sum(1 for line in f). Now the object f is empty.
If n is odd however, you call f.readlines(). This will go through all the lines again, but none are left in f. So every time n is odd, you go through the entire file. It is faster to go through it once (as in the solutions offered to your question).
As a fix, you need to type
f.close()
f = open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')
everytime after you read through the file, in order to get back to the start.
As a side note, you should look up modolus % for finding odd numbers.
Related
I have a function gen_rand_index that generates a random group of numbers in list format, such as [3,1] or [3,2,1]
I also have a textfile that that reads something like this:
red $1
green $5
blue $6
How do I write a function so that once python generates this list of numbers, it automatically reads that # line in the text file? So if it generated [2,1], instead of printing [2,1] I would get "green $5, red $1" aka the second line in the text file and the first line in the text file?
I know that you can do print(line[2]) and commands like that, but this won't work in my case because each time I am getting a different random number of a line that I want to read, it is not a set line I want to read each time.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
I have this so far, but I am getting this
error: invalid literal for int() with base 10: '[4, 1]'
I also have gotten
TypeError: string indices must be integers
butI have tried replacing str with int and many things like that but I'm thinking the way I'm just approaching this is wrong. Can anyone help me? (I have only been coding for a couple days now so I apologize in advance if this question is really basic)
Okay, let us first get some stuff out of the way
Whenever you access something from a list the thing you put inside the box brackets [] should be an integer, eg: [5]. This tells Python that you want the 5th element. It cannot ["5"] because 5 in this case would be treated as a string
Therefore the line row = str(result[gen_rand_index]) should actually just be row = ... without the call to str. This is why you got the TypeError about list indices
Secondly, as per your description gen_rand_index would return a list of numbers.
So going by that, why don;t you try this
indices_to_pull = gen_rand_index()
file_handle = open("Foodinventory.txt", 'r')
file_contents = file_handle.readlines() # If the file is small and simle this would work fine
answer = []
for index in indices_to_pull:
answer.append(file_contents[index-1])
Explanation
We get the indices of the file lines from gen_rand_index
we read the entire file into memory using readlines()
Then we get the lines we want, Rememebr to subtract 1 as the list is indexed from 0
The error you are getting is because you're trying to index a string variable (line) with a string index (row). Presumably row will contain something like '[2,3,1]'.
However, even if row was a numerical index, you're not indexing what you think you're indexing. The variable line is a string, and it contains (on any given iteration) one line of the file. Indexing this variable will give you a single character. For example, if line contains green $5, then line[2] will yield 'e'.
It looks like your intent is to index into a list of strings, which represent all the lines of the file.
If your file is not overly large, you can read the entire file into a list of lines, and then just index that array:
with open('file.txt') as fp:
lines = fp.readlines()
print(lines[2]).
In this case, lines[2] will yield the string 'blue $6\n'.
To discard the trailing newline, use lines[2].strip() instead.
I'll go line by line and raise some issues.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
Are you sure it is gen_rand_index and not gen_rand_index()? If gen_rand_index is a function, you should call the function. In the code you have, you are not calling the function, instead you are using the function directly as an index.
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
The correct python idiom for opening a file and reading line by line is
with open("Foodinventory.txt.", "r") as f:
for line in f:
...
This way you do not have to close the file; the with clause does this for you automatically.
Now, what you want to do is to print the lines of the file that correspond to the elements in your variable row. So what you need is an if statement that checks if the line number you just read from the file corresponds to the line number in your array row.
with open("Foodinventory.txt", "r") as f:
for i, line in enumerate(f):
if i == row[i]:
print(line)
But this is wrong: it would work only if your list's elements are ordered. That is not the case in your question. So let's think a little bit. You could iterate over your file multiple times, and each time you iterate over it, print out one line. But this will be inefficient: it will take time O(nm) where n==len(row) and m == number of lines in your file.
A better solution is to read all the lines of the file and save them to an array, then print the corresponding indices from this array:
arr = []
with open("Foodinventory.txt", "r") as f:
arr = list(f)
for i in row:
print(arr[i - 1]) # arrays are zero-indiced
EDIT of entire post, in order to be more clear of the problem:
s = "GATATATGCATATACTT"
t = "ATAT"
for i in range(len(s)):
if t == s[i:i+len(t)]:
print i+1,
So the purpose of the program above is to scan through the long line of DNA (s) with the short line of DNA (t), in order to find at which positions on s, that t matches. The output of the above code is:
2 4 10 #This are basically the index numbers of string s that string t matches. but as can be seen in the code above, it's i+1 to give a 1-based numbering output.
The problem I'm having is that when i try to change the code, in order to make it receive the values for s and t through a file, the readline() function is not working for me. The motif.txt file contains two strings of DNA, one on each line.
with open('txt/motif.txt', 'r') as f:
s = f.readline()
t = f.readline()
for i in range(len(s)):
if t == s[i:i+len(t)]:
print i+1,
So this code, on the other hand will output nothing at all. But when I change t to:
t = f.readline().strip()
Then the program outputs the same result as the first example did.
So i hope this has made things more clear. My question is thus, if readline() returns a string, why isn't my program in example 2 working in the same way as in the very first example?
your problem statement is wrong, there's no way s or t has more content (and len(s) > 0 or len(t) > 0) in the first example than in the second.
basically with:
s = f.readline()
then s will contain a string like "foobar \n", and thus len(s) will be 9.
Then with:
s = f.readline().strip()
with the same string, len(s) will be 6 because the stripped string is "foobar".
so if you line is full of spaces like s = " \n", s.strip() will be the empty string "", with len(s) == 0.
Then in that case your loop won't start and will never print anything.
in almost all the other cases I can think of, you should get an execption raised, not silent exit.
But to be honest, your code is bad because nobody can understand what you want to do from reading it (including you in six months).
I'm trying to find a certain word in a file and want to print the next line when a condition is met.
f = open('/path/to/file.txt','r')
lines = f.readlines()
for line in lines:
if 'P/E' in line:
n = lines.index(line) #get index of current line
print(lines[n+1]) #print the next line
a.close()
The string 'P/E' will be present 4 times in the file, each time in a different line.
When executed, the code prints the next line after the first 2 occurrences of 'P/E' normally. It then again goes back and prints the same first 2 occurrences again and exits. The loop is not proceeding after those first 2 occurrences; it kind of repeats the process and exits.
I checked the data file to see if my output is the actual result, but all next lines are different after 'P/E'.
How can I resolve this? Thanks.
list.index() with just one argument only finds the first occurrence. You'd have to give it a starting point to find elements past the previous index, list.index() takes a second argument that tells it where to start searching from.
However, you don't need to use lines.index(); that's very inefficient; it requires a full scan through the list, testing each line until a match is found.
Just use the enumerate() function to add indices as you loop:
for index, line in enumerate(lines):
if 'P/E' in line:
print(lines[index + 1])
Be careful, there is a chance index + 1 is not a valid index; if you find 'P/E' in the very last line of the lines list you'll get an IndexError. You may have to add a and index + 1 < len(lines) test.
Note that using file.readlines() reads all of the file into memory in one go. Try to avoid this; you could loop directly over the file, and remember the previous line instead:
with open('/path/to/file.txt','r') as f:
previous = ''
for line in f:
if 'P/E' in previous:
print(line) # print this line
previous = line # remember for the next iteration
I have 2 text files (new.txt and master.txt). Each has different data stored as such:
Cory 12 12:40:12.016221
Suzy 64 12:40:33.404614
Trent 145 12:40:56.640052
(catagorised by the first set of numbers appearing on each line)
I have to scan each line of new.txt for the name (e.g. Suzy), check if there is a duplicate in master.txt and if there isn't, then I add that line to master.txt catagorized by that line's number (e.g. 64 in Suzy 64 12:40:33.404614).
I have written the following script, but it falls into a loop of checking the 1st line of new.txt (I know why, I just don't know how to work around not closing fileinput.input(new.txt) so that I can then open fileinput.input(master.txt) further down the loop). I feel like I've highly over complicated things for myself and any help is appreciated.
import fileinput
import re
end_of_file = False
while end_of_file == False:
for line in fileinput.input('new.txt', inplace=1):
end_of_file = fileinput.isstdin() #ends while loop if on last line of new.txt
user_f_line_list = line.split()
master_f = open('master.txt', 'r')
master_f_read = master_f.read()
master_f.close()
fileinput.close()
if not re.findall(user_f_line_list[0], master_f_read):
for line in fileinput.input('master.txt', inplace=1):
master_line_list = line.split()
if int(user_f_line_list[1]) <= int(master_line_list[1]):
written = False
while written == False:
written = True
print(' '.join(user_f_line_list))
print(line, end='')
fileinput.close()
And for reference, master.txt starts with startline 0 and ends with endline 1000000000000000 so that it is impossible for the categorizing to be out of range.
Some suggestions:
Open master.txt into a list with readlines().
Use an OrderedDict from the collections module - it is the same as a regular dict but preserves the order. Make each key the unique element - a tuple in this case (e.g. ("Cory", 12)). Make the value whatever comes after.
Now you can very rapidly check to see if the entry is present by if key in my_dict:.
If it isn't, you can insert it. If you need to insert in order, it'll take a bit more work, but not too much. I would insert in the end, convert to a list when all is done, and apply a sort function to the list with a custom function to specify how to sort.
Output it back to the file.
I won't say it's necessarily shorter than your solution, but it is a lot cleaner.
There's a text file that I'm reading line by line. It looks something like this:
3
3
67
46
67
3
46
Each time the program encounters a new number, it writes it to a text file. The way I'm thinking of doing this is writing the first number to the file, then looking at the second number and checking if it's already in the output file. If it isn't, it writes THAT number to the file. If it is, it skips that line to avoid repetitions and goes on to the next line. How do I do this?
Rather than searching your output file, keep a set of the numbers you've written, and only write numbers that are not in the set.
Instead of checking output file for the number if it was already written it is better to keep this information in a variable (a set or list). It will save you on disk reads.
To search a file for numbers you need to loop through each line of that file, you can do that with for line in open('input'): loop, where input is the name of your file. On each iteration line would contain one line of input file ended with end of line character '\n'.
In each iteration you should try to convert the value on that line to a number, int() function may be used. You may want to protect yourself against empty lines or non-number values with try statement.
In each iteration having the number you should check if the value you found wasn't already written to the output file by checking a set of already written numbers. If value is not in the set yet, add it and write to the output file.
#!/usr/bin/env python
numbers = set() # create a set for storing numbers that were already written
out = open('output', 'w') # open 'output' file for writing
for line in open('input'): # loop through each line of 'input' file
try:
i = int(line) # try to convert line to integer
except ValueError: # if conversion to integer fails display a warning
print "Warning: cannot convert to number string '%s'" % line.strip()
continue # skip to next line on error
if i not in numbers: # check if the number wasn't already added to the set
out.write('%d\n' % i) # write the number to the 'output' file followed by EOL
numbers.add(i) # add number to the set to mark it as already added
This example assumes that your input file contains single numbers on each line. In case of empty on incorrect line a warning will be displayed to stdout.
You could also use list in the above example, but it may be less efficient.
Instead of numbers = set() use numbers = [] and instead of numbers.add(i): numbers.append(i). The if condition stays the same.
Don't do that. Use a set() to keep track of all the numbers you have seen. It will only have one of each.
numbers = set()
for line in open("numberfile"):
numbers.add(int(line.strip()))
open("outputfile", "w").write("\n".join(str(n) for n in numbers))
Note this reads them all, then writes them all out at once. This will put them in a different order than in the original file (assuming they're integers, they will come out in ascending numeric order). If you don't want that, you can also write them as you read them, but only if they are not already in the set:
numbers = set()
with open("outfile", "w") as outfile:
for line in open("numberfile"):
number = int(line.strip())
if number not in numbers:
outfile.write(str(number) + "\n")
numbers.add(number)
Are you working with exceptionally large files? You probably don't want to try to "search" the file you're writing to for a value you just wrote. You (probably) want something more like this:
encountered = set([])
with open('file1') as fhi, open('file2', 'w') as fho:
for line in fhi:
if line not in encountered:
encountered.add(line)
fho.write(line)
If you want to scan through a file to see if it contains a number on any line, you could do something like this:
def file_contains(f, n):
with f:
for line in f:
if int(line.strip()) == n:
return True
return False
However as Ned points out in his answer, this isn't a very efficient solution; if you have to search through the file again for each line, the running time of your program will increase proportional to the square of the number of numbers.
It the number of values is not incredibly large, it would be more efficient to use a set (documentation). Sets are designed to very efficiently keep track of unordered values. For example:
with open("input_file.txt", "rt") as in_file:
with open("output_file.txt", "wt") as out_file:
encountered_numbers = set()
for line in in_file:
n = int(line.strip())
if n not in encountered_numbers:
encountered_numbers.add(n)
out_file.write(line)