I have the following code which prints out a certain string I want from a file, however this script includes the line containing the strings used to call the line in the output. output I want the script to only print the middle line (not include the lines with"dipole moment" or "quadrupole".
f = open('dipole.txt','r')
always_print = False
with f as fp:
lines = fp.readlines()
for line in lines:
if always_print or "Dipole moment" in line:
print(line)
always_print = True
if 'Quadrupole' in line:
always_print = False
This should work:
f = open('dipole.txt','r')
always_print = False
with f as fp:
lines = fp.readlines()
for line in lines:
if "Dipole moment" in line:
always_print = True
elif 'Quadrupole' in line:
always_print = False
elif always_print:
print(line)
Related
Sorry if the header is poorly worded. I have a large file full of data subsets, each with a unique identifier. I want to be able to find the first line containing the identifier and print that line along with every line after that one until the next data subset is reached (that line will start with "<"). The data is structured as shown below.
<ID1|ID1_x
AAA
BBB
CCC
<ID2|ID2_x
DDD
EEE
FFF
<ID3|ID3_x
...
I would like to print:
<(ID2)
DDD
EEE
FFF
So far I have:
with open('file.txt') as f:
for line in f:
if 'ID2' in line:
print(line)
...
Try with the code below:
found_id = False
with open('file.txt') as f:
for line in f:
if '<ID' in line:
if '<ID2' in line:
id_line_split = line.split('|')
id_line = id_line_split[0][1:]
print('<(' + str(id_line) + ')')
found_id = True
else:
found_id = False
else:
if found_id == True:
# remove carriage return and line feed
line = line.replace('\n','')
line = line.replace('\r','')
print(line)
The execution of previous code in my system, with your file.txt produces this output:
<(ID2)
DDD
EEE
FFF
Second question (from comment)
To select ID2 and ID23 (see questione in the comment of this answer), the program has been changed in this way:
found_id = False
with open('file.txt') as f:
for line in f:
if '<ID' in line:
if ('<ID2' in line) or ('<ID23' in line):
id_line_split = line.split('|')
id_line = id_line_split[0][1:]
print('<(' + str(id_line) + ')')
found_id = True
else:
found_id = False
else:
if found_id == True:
# remove carriage return and line feed
line = line.replace('\n','')
line = line.replace('\r','')
print(line)```
In the text file I'm working on there are multiple lines containing the word "TOP", however, I want to get only the first occurrence coming after lines containing the word "IPT".
The second question I want to ask if it would be a better idea to work with Pandas library since it is csv (comma separated values) file.
Here's my code, but it gets all of the lines containing the word "TOP":
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
fileNum = 0
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
continue
if line[12:15] == "TOP":
print(line)
Example of my text file:
....
....
...SAT...
...
...TOP # I don't want to get this line
...
...
**...IPT...
...
...
...TOP... # I want to get this line**
...
...
...SAT...
...
...TOP... # I don't want to get this line.
**...IPT...
...TOP... # I want to get this line.**
You have two actions to write :
When you haven't seen IPT and IPT is in the line : save the line and start looking for TOP
When you see TOP and IPT has been seen : print the line and stop looking for TOP
Also, just look for basic string inclusion "TOP" in line rather than looking at a specific index, you don't need to be so specific here
temp = {}
with open("myfile.txt", 'r') as myfile:
search_mode = False
for idx, line in enumerate(myfile): # enumerate() return tuple : index + content
if not search_mode and "IPT" in line: # action 1
temp[idx] = line.rstrip()
search_mode = True
elif search_mode and "TOP" in line: # action 2
print(line)
search_mode = False
Gives :
print(json.dumps(temp, indent=4))
# >>>
...TOP... # I want get this line**
...TOP... # I want get this line.**
{
"7": "**...IPT...",
"16": "**...IPT..."
}
Pandas Dataframe are used for collection of labeled datas (imagine a CSV content) that's not what you have here
To fix your code just add variable, marking, whether IPT was already found or not.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
found_ipt=False
with open("myfile.txt", 'r') as myfile:
fileNum = 0
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
found_ipt=True
elif (line[12:15] == "TOP") & found_ipt:
print(line)
found_ipt=False
Keep track of whether you have found IPT yet in a variable "found". Then only look for TOP if found == True. The first time you find TOP after found == True is what you are looking for and you can stop looking.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
fileNum = 0
found = False
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
found = True
if found == True && line[12:15] == "TOP":
print(line)
break
lines = myfile.readlines()
for i, line in enumerate(lines):
...
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
for j, line2 in enumerate(lines[i:]):
if line2[12:15] == "TOP":
print(line2)
break
What it does is when it finds IPT line, it loops another one but slicing from the IPT line to next.
result = {}
with open("myfile.txt", 'r') as f:
ipt_found = False
for index, line in enumerate(f):
# For every line number and line in the file
if 'IPT' in line:
# If we find IPT in the line then we set ipt_found to True
ipt_found = True
elif 'TOP' in line and ipt_found:
# If we find TOP in the line and ipt_found is True then we add the line
result[index] = line
# Set ipt_found to False so we don't append anymore lines with TOP in
# until we find another line with IPT in
ipt_found = False
print(result)
That should do it.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
# This variable shows if a "IPT" has been found
string_found = False
# enumerate returns a generator of tuples, the first value of the tuple is the index (starting at 0), the second the line content
for line_num, line in enumerate(myfile.readlines()):
# if the string "IPT" is in our line and we haven't already found a previous IPT, we set string_found to True to signal that we can now get the next "TOP"
if "IPT" in line and not string_found:
string_found = True
# If there is a "TOP" in our line and we have already found an IPT previously, save the line
elif "TOP" in line and string_found:
temp[line_num] = line.replace("\n", "")
string_found = False
print(temp)
>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC
This is the text file which I am trying to read.
I want to read every gene in a different string and then add it in a list
There are header lines starting with ’>’ character to recognize if this is a start or end of a gene
with open('sequences1.txt') as input_data:
for line in input_data:
while line != ">":
list.append(line)
print(list)
When printed the list should display list should be
list =["ATGATGATGGCG","GGCATATCCGGATACC","TAGCTAGCCCGC"]
with open('sequences1.txt') as input_data:
sequences = []
gene = []
for line in input_data:
if line.startswith('>gene'):
if gene:
sequences.append(''.join(gene))
gene = []
else:
gene.append(line.strip())
sequences.append(''.join(gene)) # append last gene
print(sequences)
output:
['ATGATGATGGCG', 'GGCATATCCGGATACC', 'TAGCTAGCCCGC']
You have multiple mistakes in your code, look here:
with open('sequences1.txt', 'r') as file:
list = []
for line in file.read().split('\n'):
if not line.startswith(">") and len(line$
list.append(line)
print(list)
Try this:
$ cat genes.txt
>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC
$ python
>>> genes = []
>>> with open('genes.txt') as file_:
... for line in f:
... if not line.startswith('>'):
... genes.append(line.strip())
...
>>> print(genes)
['ATGATGATGGCG', 'GGCATATC', 'CGGATACC', 'TAGCTAGCCCGC']
sequences1.txt:
>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC
and then:
desired_text = []
with open('sequences1.txt') as input_data:
content = input_data.readlines()
content = [l.strip() for l in content if l.strip()]
for line in content:
if not line.startswith('>'):
desired_text.append(line)
print(desired_text)
OUTPUT:
['ATGATGATGGCG', 'GGCATATC', 'CGGATACC', 'TAGCTAGCCCGC']
EDIT:
Sped-read it, fixed it with the desired output
with open('sequences1.txt') as input_data:
content = input_data.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
# flag
nextLine = False
# list to save the lines
textList = []
concatenated = ''
for line in content:
find_TC = line.find('gene')
if find_TC > 0:
nextLine = not nextLine
else:
if nextLine:
textList.append(line)
else:
if find_TC < 0:
if concatenated != '':
concatenated = concatenated + line
textList.append(concatenated)
else:
concatenated = line
print(textList)
OUTPUT:
['ATGATGATGGCG', 'GGCATATCCGGATACC', 'TAGCTAGCCCGC']
I tried to search for similar questions, but I couldn't find. Please mark as a duplicate if there is similar questions available.
I'm trying to figure out a way to read and gather multiple information from single file. Here in the file Block-A,B & C are repeated in random order and Block-C has more than one information to capture. Every block end with 'END' text. Here is the input file:
Block-A:
(info1)
END
Block-B:
(info2)
END
Block-C:
(info3)
(info4)
END
Block-C:
(info7)
(info8)
END
Block-A:
(info5)
END
Block-B:
(info6)
END
Here is my code:
import re
out1 = out2 = out3 = ""
a = b = c = False
array=[]
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('Block-A'):
line = next(f)
out1 = line
a = True
if line.startswith('Block-B'):
line=next(f)
out2 = line
b = True
if line.startswith('Block-C'):
c = True
if c:
line=next(f)
if not line.startswith('END\n'):
out3 = line
array.append(out3.strip())
if a == b == c == True:
print(out1.rstrip() +', ' + out2.rstrip() + ', ' + str(array))
a = b = c = False
array=[]
Thank you in advance for your valuable inputs.
Use a dictionary for the datas from each block. When you read the line that starts a block, set a variable to that name, and use it as the key into the dictionary.
out = {}
with open('test.txt', 'r') as f:
for line in f:
if line.endswidth(':'):
blockname = line[:-1]
if not blockname in out:
out[blockname] = ''
elif line == 'END'
blockname = None
else if blockname:
out[blockname] += line
print(out)
If you don't want the Block-X to print, unhash the elif statment
import os
data = r'/home/x/Desktop/test'
txt = open(data, 'r')
for line in txt.readlines():
line = line[:-1]
if line in ('END'):
pass
#elif line.startswith('Block'):
# pass
else:
print line
>>>>
Block-A:
(info1)
Block-B:
(info2)
Block-C:
(info3)
(info4)
Block-C:
(info7)
(info8)
Block-A:
(info5)
Block-B:
(info6)
In my code I have a line length print like this:
line = file.readline()
print("length = ", len(line))
after that I start to scan the lines by doing this:
for i in range(len(line)):
if(file.read(1) == 'b'):
print("letter 'b' found.")
The problem is that the for loop starts reading on line 2 of the file.
How can I make it start reading at line 1 without closing and reopening the file?
It is possible to use file.seek to move the position of the next read, but that's inefficient. You've already read in the line, so you can just process
line without having to read it in a second time.
with open(filename,'r') as f:
line = f.readline()
print("length = ", len(line))
if 'b' in line:
print("letter 'b' found.")
for line in f:
...
It seems that you need to handle the first line specially.
lineno = 1
found = False
for line in file:
if 'b' in line:
found = True
if lineno == 1:
print("length of first line: %d" % len(line))
lineno += 1
if found:
print("letter 'b' found.")
It sounds like you want something like this:
with open('file.txt', 'r') as f:
for line in f:
for character in line:
if character == "b":
print "letter 'b' found."
or if you just need the number:
with open('file.txt', 'r') as f:
b = sum(1 for line in f for char in line if char == "b")
print "found %d b" % b
#! usr/bin/env python
#Open the file , i assumed its called somefile.txt
file = open('somefile.txt.txt','r')
#Lets loop through the lines ...
for line in file.readlines():
#test if letter 'b' is in each line ...
if 'b' in line:
#print that we found a b in the line
print "letter b found"