take a string from a file in which it occupies multiple lines - python

I have a problem with a python program. In this program I have to take strings from a file and save it to a list. The problem is that in this file some strings occupy more lines.
The file named 'ft1.txt' is structured like this:
'''
home
wo
rk
''''
sec
urity
'''
inform
atio
n
'''
Consequently opening the file and doing f.read () I get out:
" \n\nhome\nwo\nrk\n\nsec\nurity\n\ninform\nation\nn ".
I execute the following code:
with open('ft1.txt', 'r') as f: #i open file
list_strin = f.read().split('\n\n') #save string in list
I want output [homework, security, information].
But the actual output is [home\nwo\nrk, sec\nurity, inform\nation\nn]
How can I remove the special character "\n" in individual strings and merge them correctly?

You have \n in string. Remove it :-)
list_strin = [x.replace('\n', '') for x in f.read().strip().split('\n\n')]
readline solution:
res = []
s = ''
with open('ft1.txt', 'r') as f:
line = f.readline()
while line:
line = line.strip()
if line == '':
if s:
res.append(s)
s = ''
else:
s += line
line = f.readline()
print(res)

Related

How to select the last character of a header in a fasta file?

I have a fasta file like this:
>XP1987651-apple1
ACCTTCCAAGTAG
>XP1235689-lemon2
TTGGAGTCCTGAG
>XP1254115-pear1
ATGCCGTAGTCAA
I would like to create a file selecting the header that ends with '1', for example:
>XP1987651-apple1
ACCTTCCAAGTAG
>XP1254115-pear1
ATGCCGTAGTCAA
so far I create this:
fasta = open('x.fasta')
output = open('x1.fasta', 'w')
seq = ''
for line in fasta:
if line[0] == '>' and seq == '':
header = line
elif line[0] != '>':
seq = seq + line
for n in header:
n = header[-1]
if '1' in n:
output.write(header + seq)
header= line
seq = ''
if "1" in header:
output.write(header + seq)
output.close()
However, it doesn't produce any output in the new file created. Can you please spot the error?
Thank you
One option would be to read the entire file into a string, and then use re.findall with the following regex pattern:
>[A-Z0-9]+-\w+1\r?\n[ACGT]+
Sample script:
fasta = open('x.fasta')
text = fasta.read()
matches = re.findall(r'>[A-Z0-9]+-\w+1\r?\n[ACGT]+', text)
print(matches)
For the sample data you gave above, this prints:
['>XP1987651-apple1\nACCTTCCAAGTAG', '>XP1254115-pear1\nATGCCGTAGTCAA']
You can start by getting a list of your individual records which are delimited by '>' and extract the header and body using a single split by newline .split('\n', 1)
records = [
line.split('\n', 1)
for line in fasta.read().split('>')[1:]
]
Then you can simply filter out records that do not end with 1
for header, body in records:
if header.endswith('1'):
output.write('>' + header + '\n')
output.write(body)
You can quite simply set a flag when you see a matching header line.
with open('x.fasta') as fasta, open('x1.fasta', 'w') as output:
for line in fasta:
if line.startswith('>'):
select = line.endswith('1\n')
if select:
output.write(line)
This avoids reading the entire file into memory; you are only examining one line at a time.
Maybe notice that line will contain the newline at the end of the line. I opted to simply keep it; sometimes, things are easier if you trim it with line = line.rstrip('\n') and add it back on output if necessary.

How to read a value of file separate by tabs in Python?

I have a text file with this format
ConfigFile 1.1
;
; Version: 4.0.32.1
; Date="2021/04/08" Time="11:54:46" UTC="8"
;
Name
John Legend
Type
Student
Number
s1054520
I would like to get the value of Name or Type or Number
How do I get it?
I tried with this method, but it does not solve my problem.
import re
f = open("Data.txt", "r")
file = f.read()
Name = re.findall("Name", file)
print(Name)
My expectation output is John Legend
Anyone can help me please. I really appreciated. Thank you
First of all re.findall is used to search for “all” occurrences that match a given pattern. So in your case. you are finding every "Name" in the file. Because that's what you are looking for.
On the other hand, the computer will not know the "John Legend" is the name. it will only know that's the line after the word "Name".
In your case I will suggest you can check this link.
Find the "Name"'s line number
Read the next line
Get the name without the white space
If there is more than 1 Name. this will work as well
the final code is like this
def search_string_in_file(file_name, string_to_search):
"""Search for the given string in file and return lines containing that string,
along with line numbers"""
line_number = 0
list_of_results = []
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
line_number += 1
if string_to_search in line:
# If yes, then add the line number & line as a tuple in the list
list_of_results.append((line_number, line.rstrip()))
# Return list of tuples containing line numbers and lines where string is found
return list_of_results
file = open('Data.txt')
content = file.readlines()
matched_lines = search_string_in_file('Data.txt', 'Name')
print('Total Matched lines : ', len(matched_lines))
for i in matched_lines:
print(content[i[0]].strip())
Here I'm going through each line and when I encounter Name I will add the next line (you can directly print too) to the result list:
import re
def print_hi(name):
result = []
regexp = re.compile(r'Name*')
gotname = False;
with open('test.txt') as f:
for line in f:
if gotname:
result.append(line.strip())
gotname = False
match = regexp.match(line)
if match:
gotname = True
print(result)
if __name__ == '__main__':
print_hi('test')
Assuming those label lines are in the sequence found in the file you
can simply scan for them:
labelList = ["Name","Type","Number"]
captures = dict()
with open("Data.txt","rt") as f:
for label in labelList:
while not f.readline().startswith(label):
pass
captures[label] = f.readline().strip()
for label in labelList:
print(f"{label} : {captures[label]}")
I wouldn't use a regex, but rather make a parser for the file type. The rules might be:
The first line can be ignored
Any lines that start with ; can be ignored.
Every line with no leading whitespace is a key
Every line with leading whitespace is a value belonging to the last
key
I'd start with a generator that can return to you any unignored line:
def read_data_lines(filename):
with open(filename, "r") as f:
# skip the first line
f.readline()
# read until no more lines
while line := f.readline():
# skip lines that start with ;
if not line.startswith(";"):
yield line
Then fill up a dict by following rules 3 and 4:
def parse_data_file(filename):
data = {}
key = None
for line in read_data_lines(filename):
# No starting whitespace makes this a key
if not line.startswith(" "):
key = line.strip()
# Starting whitespace makes this a value for the last key
else:
data[key] = line.strip()
return data
Now at this point you can parse the file and print whatever key you want:
data = parse_data_file("Data.txt")
print(data["Name"])

I want to replace words from a file by the line no using python i have a list of line no?

if I have a file like:
Flower
Magnet
5001
100
0
and I have a list containing line number, which I have to change.
list =[2,3]
How can I do this using python and the output I expect is:
Flower
Most
Most
100
0
Code that I've tried:
f = open("your_file.txt","r")
line = f.readlines()[2]
print(line)
if line=="5001":
print "yes"
else:
print "no"
but it is not able to match.
i want to overwrite the file which i am reading
You may simply loop through the list of indices that you have to replace in your file (my original answer needlessly looped through all lines in the file):
with open('test.txt') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
print('\n'.join(data))
Output:
Flower
Most
Most
100
0
To overwrite the file you have opened with the replacements, you may use the following:
with open('test.txt', 'r+') as f:
data = f.read().splitlines()
replace = {1,2}
for i in replace:
data[i] = 'Most'
f.seek(0)
f.write('\n'.join(data))
f.truncate()
The reason that you're having this problem is that when you take a line from a file opened in python, you also get the newline character (\n) at the end. To solve this, you could use the string.strip() function, which will automatically remove these characters.
Eg.
f = open("your_file.txt","r")
line = f.readlines()
lineToCheck = line[2].strip()
if(lineToCheck == "5001"):
print("yes")
else:
print("no")

Python filtering non alphanumeric not working properly

I have a text file with random letters,numbers and characters in it. I have to remove the special characters and only end up with alphanumeric ones, while printing the process.
Text file is like this:
fkdjks97#!%&jd
28e8uw99...
and so on
For some reason it's printing:
Line read' ,,s.8,ymsw5w-86
'
' ,,s.8,ymsw5w-86
'->' <filter object at 0x0000020406BC8550> '
These should go on only 2 lines, instead of 4. Like this:
Line read' ,,s.8,ymsw5w-86'
' ,,s.8,ymsw5w-86' -> 's8ymsw5w86'
My attempt:
file1 = open(textfile1,"r")
while True:
line = file1.readline()
line2 = filter(str.isalnum,line)
print("Line read'", str(line), "'")
print("'", str(line), "'->'", line2, "'")
if len(line) == 0:
break
filter() is an iterator object; you'll need to actually iterate over it to pull out the results.
In this case, you want a string back, so you could use str.join() to do the iteration and put everything back into a single string:
line2 = ''.join(filter(str.isalnum, line))
Note that you shouldn't really need to use a while True loop with file1.readline() calls. You can use a for loop directly over the file to get the lines by replacing the while True, line = file1.readline() and if len(line) == 0: break lines with:
for line in file1:
# ...
You might be looking for a regex solution:
import re
rx = re.compile(r'[^A-Za-z]+')
# some sample line
line = 'fkdjks97#!%&jd'
# and then later on
line = rx.sub('', line)
print(line)
Which yields
# fkdjksjd
Putting this in a with... construct, you might be using
with open(textfile1, "r") as fp:
line = rx.sub('', fp.readline())
print(line)

Read text file to list in python

I want to create a text file which contains positive/negative numbers separated by ','.
i want to read this file and put it in data = []. i have written the code below and i think that it works well.
I want to ask if you guys know a better way to do it or if is it well written
thanks all
#!/usr/bin/python
if __name__ == "__main__":
#create new file
fo = open("foo.txt", "w")
fo.write( "111,-222,-333");
fo.close()
#read the file
fo = open("foo.txt", "r")
tmp= []
data = []
count = 0
tmp = fo.read() #read all the file
for i in range(len(tmp)): #len is 11 in this case
if (tmp[i] != ','):
count+=1
else:
data.append(tmp[i-count : i])
count = 0
data.append(tmp[i+1-count : i+1])#append the last -333
print data
fo.close()
You can use split method with a comma as a separator:
fin = open('foo.txt')
for line in fin:
data.extend(line.split(','))
fin.close()
Instead of looping through, you can just use split:
#!/usr/bin/python
if __name__ == "__main__":
#create new file
fo = open("foo.txt", "w")
fo.write( "111,-222,-333");
fo.close()
#read the file
with open('foo.txt', 'r') as file:
data = [line.split(',') for line in file.readlines()]
print(data)
Note that this gives back a list of lists, with each list being from a separate line. In your example you only have one line. If your files will always only have a single line, you can just take the first element, data[0]
To get the whole file content(numbers positive and negative) into list you can use split and splitlines
file_obj = fo.read()#read your content into string
list_numbers = file_obj.replace('\n',',').split(',')#split on ',' and newline
print list_numbers

Categories