Find line, then get the next line - python

I have the following problem:
I open a file and read it line by line searching for a specific pattern. When I found it, I would like to write the entire line AND THE NEXT TWO LINES into a new file. The problem is that I don't know how to get from the line that I've found to the next 2.
AAA
XXX
XXX
BBB
XXX
XXX
CCC
XXX
XXX
In this example it would be that I find "BBB" and I want to get the next two lines.
What could I do? Thank you very much for your kind help!
Edit: I realized that I have to ask more precisely.
This is the code from my colleague
for k in range(0,len(watcrd)):
if cvt[k]>cvmin:
intwat+=1
sumcv+=cvt[k]
sumtrj+=trj[k]/((i+1)*sep/100)
endline='%5.2f %5.2f' % (cvt[k],trj[k]/((i+1)*sep/100)) # ivan
ftrj.write(watline[k][:55]+endline+'\n')
fall.write(watline[k][:55]+endline+'\n')
For every k in range I would like to write k, k+1, k+2 to the file ftrj.
Which is the best way to do this?
Edit 2: I am sorry, but I realized that I've made a mistake. What you suggested worked, but I realized that I have to include it in a different part of the code.
for line in lines[model[i]:model[i+1]]:
if line.startswith('ATOM'):
resi=line[22:26]
resn=line[17:20]
atn=line[12:16]
crd=[float(line[31:38]),float(line[38:46]),float(line[46:54])]
if (resn in noprot)==False and atn.strip().startswith('CA')==True:
protcrd.append(crd)
if (resn in reswat)==True and (atn.strip() in atwat)==True:
watcrd.append(crd)
watline.append(line)
I would think of something like this:
(...)
if (resn in reswat)==True and (atn.strip() in atwat)==True:
watcrd.append(crd)
watline.append(line)
for i in range(1, 3):
try:
watcrd.append(crd[line + i])
watline.append(line[line + i])
except IndexError:
break
But it doesn't work. How can I indicate the part and the line that I want to append to this list?

Python file objects are iterators, you can always ask for the next lines:
with open(inputfilename) as infh:
for line in infh:
if line.strip() == 'BBB':
# Get next to lines:
print next(infh)
print next(infh)
Here next() function advances the infh iterator to the next line, returning that line.
However, you are not processing a file; you are processing a list instead; you can always access later indices in the list:
ftrj.write(watline[k][:55]+endline+'\n')
fall.write(watline[k][:55]+endline+'\n')
for i in range(1, 3):
try:
ftrj.write(watline[k + i][:55]+endline+'\n')
fall.write(watline[k + i][:55]+endline+'\n')
except IndexError:
# we ran out of lines in watline
break

Related

Creating a File and then Loop for an Incrementing Value

I am attempting to create a loop that creates a file named "/tmp/newfile.txt" and creates 29 lines of text. Line 1 should read: "I see 0 sheep". For each line, 1 sheep should be added and a new line created to reflect that until 29 sheep (and lines) are reached.
x = 0
myDoc = myDoc.readfiles("/tmp/newfile.txt", "r+")
myDoc.write("I see" + str(x) + "sheep")
for line in myDoc.readfiles():
x = x + 1
myDoc.append(x)
print(myDoc)
if x == 30
break;
First, what I tried to do is create a new file and put it into a variable (myDoc) that would open it. I specified w+ so that I would have the ability to read the file and write on it. I gave the changing number a variable 'x'.
The function I intended to, for each line in the file, write "I see x sheep". Afterward, add 1 to the current value of x and append it so it's added to the file. After this, print it so I can see the line(s). Once that value reached 30, cease the loop because 29 is the number of lines I need.
My errors have to do with indentation and nothing being printed at all. I am extremely new to this.
Welcome to StackOverflow!
There seem to be a couple of issues in the code:
Indentation / Syntax Errors - It seems that you are using Python, which follows strict indentation and whitespace rules. An indent is inserted when you enter a new local scope / new control flow / enter an if/elif/else statement or a while or for loop, to separate it from the current scope.
You'd need to remove the space on the left side on line 3 and line 6.
Also, on line 8 there should be a colon(:) after the if x==30.
The mode used (w+) isn't going to work as expected.
This mode overwrites a file if it already exists and allows you to read and write to that file. Instead, you would need the r+ mode.
There's a great explanation & flowchart in this answer explaining the various file modes - https://stackoverflow.com/a/30566011/13307211
The for loop can't iterate over myDoc.
The open function gives a file object (TextIOWrapper), which can't be iterated over. You could use the myDoc.readfiles() method, which returns a list of lines present in the file and loop over that - for line in myDoc.readfiles().
printing myDoc and using .append() with myDoc wouldn't work as expected. It's representing a file object, which doesn't have an append method. Also, I feel like there might have been some mixed logic here - were you trying to iterate over myDoc like an array and hence pushing value to it?
I'd suggest removing the append part as the past value of x isn't going to be needed for what you want to do.
After applying the above, you should end up with code that looks like this -
x = 0
myDoc = open("./newfile.txt", "r+")
for line in myDoc.readlines():
myDoc.write("I see" + str(x) + "sheep\n")
x = x + 1
if x == 30:
break
Now, this doesn't exactly do what you want it to do...
The first thing we should do is update the for loop - a for loop should be structured in a way where it has a start, an end, and an increment, or it should iterate over a range of values. Python has a neat range function that allows you to iterate between values.
for x in range(1, 10):
print(x)
the above would print values from 1 to 10, excluding 10.
updating our for loop, we can change the code to -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep")
we could also use a while loop here -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep")
this makes the file but without the lines and without the right formatting. "I see " + str(x) + " sheep" should fix the sentence, but to print the string on multiple lines instead of the same line, you would need to use the newline character(\n) and add it at the end of the string -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep\n")

My program can't write output in a file in the expected format

I'm working through a few coding problems on this website I found. To my understanding, what the website does to check whether my program is outputting the expected results is that it makes me write the output on a new file line by line, and then it compares my file with the file that contains the answers. I'm trying to submit my solution for a problem and keep getting the following error message:
> Run 1: Execution error: Your program did not produce an answer
that was judged as correct. The program stopped at 0.025 seconds;
it used 9360 KB of memory. At character number 7, your answer says
'<Newline>' while the correct answer says ' '.
Here are the respective outputs:
----- our output ---------
mitnik_2923
Poulsen_557
Tanner_128
Stallman_-311
Ritchie_-1777
Baran_245
Spafford_-1997
Farmer_440
Venema_391
Linus_-599
---- your output ---------
mitnik
_2923Poulsen
_557Tanner
_128Stallman
_-311Ritchie
_-1777Baran
_245Spafford
_-1997Farmer
_440Venema
_391Linus
_-599
--------------------------
I'm pretty sure my program outputs the expected results, but in the wrong format. Now, I've never written stuff on files using Python before, and therefore don't know what I'm supposed to change to get my output in the proper format. Can someone help me? Here's my code:
fin = open ('gift1.in', 'r')
fout = open ('gift1.out', 'w')
NP,d=int(fin.readline()),dict()
for _ in range(NP):
d[fin.readline()]=0
for _ in range(NP):
giver=fin.readline()
amt,ppl=list(map(int,fin.readline().split()))
if ppl==0 or amt==0:sub=-amt;give=0
else:sub=amt-(amt%ppl);give=amt//ppl
d[giver]-=sub
for per in range(ppl):
d[fin.readline()]+=give
for i in d: ##I'm doing the outputting in this for loop..
ans=str(i)+' '+str(d[i])
fout.write(ans)
fout.close()
The line returned by find.readline() includes the trailing newline. You should strip that off before using it as the dictionary key. That's why you see a newline after all the names.
fout.write() doesn't add a newline after the string you're writing, you need to add that explicitly. That's why there's no newline between the number and the next name.
with open ('gift1.in', 'r') as fin:
NP = int(fin.readline())
d = {fin.readline().strip(): 0 for _ in range(NP)}
for _ in range(NP):
giver=fin.readline().strip()
amt, ppl= map(int,fin.readline().split())
if ppl==0 or amt==0:
sub=-amt
give=0
else:
sub=amt-(amt%ppl)
give=amt//ppl
d[giver]-=sub
for per in range(ppl):
d[fin.readline().strip()]+=give
with open ('gift1.out', 'w') as fout:
for i in d: ##I'm doing the outputting in this for loop..
ans= i + " " + str(d[i])+'\n'
fout.write(ans)
Other points:
Don't cram multiple assignments onto the same line unnecessarily. And no need to put the if and else all on 1 line.
i is a string, there's no need to use str(i)
Use a context manager when opening files.

Python: split line by comma, then by space

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.
The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.
With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt
How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Wit's end with file to dict

Python: 2.7.9
I erased all of my code because I'm going nuts.
Here's the gist (its for Rosalind challenge thingy):
I want to take a file that looks like this (no quotes on carets)
">"Rosalind_0304
actgatcgtcgctgtactcg
actcgactacgtagctacgtacgctgcatagt
">"Rosalind_2480
gctatcggtactgcgctgctacgtg
ccccccgaagaatagatag
">"Rosalind_2452
cgtacgatctagc
aaattcgcctcgaactcg
etc...
What I can't figure out how to do is basically everything at this point, my mind is so muddled. I'll just show kind of what I was doing, but failing to do.
1st. I want to search the file for '>'
Then assign the rest of that line into the dictionary as a key.
read the next lines up until the next '>' and do some calculations and return
findings into the value for that key.
go through the file and do it for every string.
then compare all values and return the key of whichever one is highest.
Can anyone help?
It might help if I just take a break. I've been coding all day and i think I smell colors.
def func(dna_str):
bla
return gcp #gc count percentage returned to the value in dict
With my_function somewhere that returns that percentage value:
with open('rosalind.txt', 'r') as ros:
rosa = {line[1:].split(' ')[0]:my_function(line.split(' ')[1].strip()) for line in ros if line.strip()}
top_key = max(rosa, key=rosa.get)
print(top_key, rosa.get(top_key))
For each line in the file, that will first check if there's anything left of the line after stripping trailing whitespace, then discard the blank lines. Next, it adds each non-blank line as an entry to a dictionary, with the key being everything to the left of the space except for the unneeded >, and the value being the result of sending everything to the right of the space to your function.
Then it saves the key corresponding to the highest value, then prints that key along with its corresponding value. You're left with a dictionary rosa that you can process however you like.
Complete code of the module:
def my_function(dna):
return 100 * len(dna.replace('A','').replace('T',''))/len(dna)
with open('rosalind.txt', 'r') as ros:
with open('rosalind_clean.txt', 'w') as output:
for line in ros:
if line.startswith('>'):
output.write('\n'+line.strip())
elif line.strip():
output.write(line.strip())
with open('rosalind_clean.txt', 'r') as ros:
rosa = {line[1:].split(' ')[0]:my_function(line.split(' ')[1].strip()) for line in ros if line.strip()}
top_key = max(rosa, key=rosa.get)
print(top_key, rosa.get(top_key))
Complete content of rosalind.txt:
>Rosalind_6404 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCG
TTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG
>Rosalind_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCA
GGCGCTCCGCCGAAGGTCTATATCCA
TTTGTCAGCAGACACGC
>Rosalind_0808 CCACCCTCGTGGT
ATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT
Result when running the module:
Rosalind_0808 60.91954022988506
This should properly handle an input file that doesn't necessarily have one entry per line.
See SO's formatting guide to learn how to make inline or block code tags to get past things like ">". If you want it to appear as regular text rather than code, escape the > with a backslash:
Type:
\>Rosalind
Result:
>Rosalind
I think I got that part down now. Thanks so much. BUUUUT. Its throwing an error about it.
rosa = {line[1:].split(' ')[0]:calc(line.split(' ')[1].strip()) for line in ros if line.strip()}
IndexError: list index out of range
this is my func btw.
def calc(dna_str):
for x in dna_str:
if x == 'G':
gc += 1
divc += 1
elif x == 'C':
gc += 1
divc += 1
else:
divc += 1
gcp = float(gc/divc)
return gcp
Exact test file. no blank lines before or after.
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Python/IPython strange non reproducible list index out of range error

I have recently been learning some Python and how to apply it to my work. I have written a couple of scripts successfully, but I am having an issue I just cannot figure out.
I am opening a file with ~4000 lines, two tab separated columns per line. When reading the input file, I get an index error saying that the list index is out of range. However, while I get the error every time, it doesn't happen on the same line every time (as in, it will throw the error on different lines everytime!). So, for some reason, it works generally but then (seemingly) randomly fails.
As I literally only started learning Python last week, I am stumped. I have looked around for the same problem, but not found anything similar. Furthermore I don't know if this is a problem that is language specific or IPython specific. Any help would be greatly appreciated!
input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()
output = open("output.txt", "w")
for each in input:
splits = each.split("\t")
changelist = list(splits[0])
second = int(splits[1])
print second
if changelist[7] == ";":
changelist.insert(6, "000")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[8] == ";":
changelist.insert(6, "00")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[9] == ";":
changelist.insert(6, "0")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
else:
#output.write(str("".join(changelist)))
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
output.close()
The error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
57 splits = each.split("\t")
58 changelist = list(splits[0])
---> 59 second = int(splits[1])
60
61 print second
IndexError: list index out of range
Input:
ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
Desired output:
ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
The reason you're getting the IndexError is that your input-file is apparently not entirely tab delimited. That's why there is nothing at splits[1] when you attempt to access it.
Your code could use some refactoring. First of all you're repeating yourself with the if-checks, it's unnecessary. This just pads the cds0 to 7 characters which is probably not what you want. I threw the following together to demonstrate how you could refactor your code to be a little more pythonic and dry. I can't guarantee it'll work with your dataset, but I'm hoping it might help you understand how to do things differently.
to_sort = []
# We can open two files using the with statement. This will also handle
# closing the files for us, when we exit the block.
with open("count.txt", "r") as inp, open("output.txt", "w") as out:
for each in inp:
# Split at ';'... So you won't have to worry about whether or not
# the file is tab delimited
changed = each.split(";")
# Get the value you want. This is called unpacking.
# The value before '=' will always be 'ID', so we don't really care about it.
# _ is generally used as a variable name when the value is discarded.
_, value = changed[0].split("=")
# 0-pad the desired value to 7 characters. Python string formatting
# makes this very easy. This will replace the current value in the list.
changed[0] = "ID={:0<7}".format(value)
# Join the changed-list with the original separator and
# and append it to the sort list.
to_sort.append(";".join(changed))
# Write the results to the file all at once. Your test data already
# provided the newlines, you can just write it out as it is.
output.writelines(to_sort)
# Do what else you need to do. Maybe to_list.sort()?
You'll notice that this code is reduces your code down to 8 lines but achieves the exact same thing, does not repeat itself and is pretty easy to understand.
Please read the PEP8, the Zen of python, and go through the official tutorial.
This happens when there is a line in count.txt which doesn't contain the tab character. So when you split by tab character there will not be any splits[1]. Hence the error "Index out of range".
To know which line is causing the error, just add a print(each) after splits in line 57. The line printed before the error message is your culprit. If your input file keeps changing, then you will get different locations. Change your script to handle such malformed lines.

Categories