How to fetch data from a text file using regex in python? - python

I have a txt file, where I have lots of information, but I only want the ones which starts like this:
1. #BEGIN_DRUGCARD DB00001
2. # Generic_Name:
Lepirudin
I want to get in the first case what is starts with DB00001.
In the second case, what is in the second line, then save both of them in to a text file.
I have the following script, but it's not working, I get the following error:
Traceback (most recent call last):
File "/home/viki/workspace/prbb/drugnames", line 22, in
drug_id = line()
TypeError: 'str' object is not callable
Any ideas?
import re
regex1 = '#BEGIN_DRUGCARD '
regex2 = '# Generic_Name:'
x=y=0
e = open ('drugbank.txt', 'r')
f = open ('Drug_output.txt', 'w')
for line in e.readlines():
if re.match(regex1, line):
y=1
continue
elif re.match(regex2, line):
x=1
continue
if y==1:
drug_id = line()
if x==1:
generic_name = line.split()
f.write('drug_id')
f.write('\n\n')
f.write('generic_name')

line() means "call the function named line", and of course this can't work because line is a string.
But there are several other problems with your code as well. It will only find the last matches in your drugbank.txt file because it overwrites all the previous cases before writing anything to the file, and when it's writing something, it's writing the text drug_id instead of the contents of the variable drug_id). Also, you're using split() wrong. Have you read a Python tutorial?
Assuming that your drugbank.txt contains several drugs, and that each drug's ID and generic name always follow each other, you could do the job like this:
import re
regex = r'#BEGIN_DRUGCARD\s*(.*)\s*# Generic_Name:\s*(.*)'
with open ('drugbank.txt', 'r') as infile:
drugs = infile.read()
results = re.findall(regex,drugs)
with open('Drug_output.txt', 'w') as outfile:
for match in results:
outfile.write(match[0] + "\n" + match[1] + "\n\n")

Related

How do I print only the first instance of a string in a text file using Python?

I am trying to extract data from a .txt file in Python. My goal is to capture the last occurrence of a certain word and show the next line, so I do a reverse () of the text and read from behind. In this case, I search for the word 'MEC', and show the next line, but I capture all occurrences of the word, not the first.
Any idea what I need to do?
Thanks!
This is what my code looks like:
import re
from file_read_backwards import FileReadBackwards
with FileReadBackwards("camdex.txt", encoding="utf-8") as file:
for l in file:
lines = l
while line:
if re.match('MEC', line):
x = (file.readline())
x2 = (x.strip('\n'))
print(x2)
break
line = file.readline()
The txt file contains this:
MEC
29/35
MEC
28,29/35
And with my code print this output:
28,29/35
29/35
And my objetive is print only this:
28,29/35
This will give you the result as well. Loop through lines, add the matching lines to an array. Then print the last element.
import re
with open("data\camdex.txt", encoding="utf-8") as file:
result = []
for line in file:
if re.match('MEC', line):
x = file.readline()
result.append(x.strip('\n'))
print(result[-1])
Get rid of the extra imports and overhead. Read your file normally, remembering the last line that qualifies.
with ("camdex.txt", encoding="utf-8") as file:
for line in file:
if line.startswith("MEC"):
last = line
print(last[4:-1]) # "4" gets rid of "MEC "; "-1" stops just before the line feed.
If the file is very large, then reading backwards makes sense -- seeking to the end and backing up will be faster than reading to the end.

Python Insert text before a specific line

I want to insert a text specifically before a line 'Number'.
I want to insert 'Hello Everyone' befor the line starting with 'Number'
My code:
import re
result = []
with open("text2.txt", "r+") as f:
a = [x.rstrip() for x in f] # stores all lines from f into an array and removes "\n"
# Find the first occurance of "Centre" and store its index
for item in a:
if item.startswith("Number"): # same as your re check
break
ind = a.index(item) #here it produces index no./line no.
result.extend(a[:ind])
f.write('Hello Everyone')
tEXT FILE:
QWEW
RW
...
Number hey
Number ho
Expected output:
QWEW
RW
...
Hello Everyone
Number hey
Number ho
Please help me to fix my code:I dont get anything inserted with my text file!Please help!
Answers will be appreciated!
The problem
When you do open("text2.txt", "r"), you open your file for reading, not for writing. Therefore, nothing appears in your file.
The fix
Using r+ instead of r allows you to also write to the file (this was also pointed out in the comments. However, it overwrites, so be careful (this is an OS limitation, as described e.g. here). The following should do what you desire: It inserts "Hello everyone" into the list of lines and then overwrites the file with the updated lines.
with open("text2.txt", "r+") as f:
a = [x.rstrip() for x in f]
index = 0
for item in a:
if item.startswith("Number"):
a.insert(index, "Hello everyone") # Inserts "Hello everyone" into `a`
break
index += 1
# Go to start of file and clear it
f.seek(0)
f.truncate()
# Write each line back
for line in a:
f.write(line + "\n")
The correct answer to your problem is the hlt one, but consider also using the fileinput module:
import fileinput
found = False
for line in fileinput.input('DATA', inplace=True):
if not found and line.startswith('Number'):
print 'Hello everyone'
found = True
print line,
This is basically the same question as here: they propose to do it in three steps: read everything / insert / rewrite everything
with open("/tmp/text2.txt", "r") as f:
lines = f.readlines()
for index, line in enumerate(lines):
if line.startswith("Number"):
break
lines.insert(index, "Hello everyone !\n")
with open("/tmp/text2.txt", "w") as f:
contents = f.writelines(lines)

Find a string and insert text after it in Python

I am still learner in python. I was not able to find a specific string and insert multiple strings after that string in python. I want to search the line in the file and insert the content of write function
I have tried the following which is inserting at the end of the file.
line = '<abc hij kdkd>'
dataFile = open('C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml', 'a')
dataFile.write('<!--Delivery Date: 02/15/2013-->\n<!--XML Script: 1.0.0.1-->\n')
dataFile.close()
You can use fileinput to modify the same file inplace and re to search for particular pattern
import fileinput,re
def modify_file(file_name,pattern,value=""):
fh=fileinput.input(file_name,inplace=True)
for line in fh:
replacement=value + line
line=re.sub(pattern,replacement,line)
sys.stdout.write(line)
fh.close()
You can call this function something like this:
modify_file("C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml",
"abc..",
"!--Delivery Date:")
Python strings are immutable, which means that you wouldn't actually modify the input string -you would create a new one which has the first part of the input string, then the text you want to insert, then the rest of the input string.
You can use the find method on Python strings to locate the text you're looking for:
def insertAfter(haystack, needle, newText):
""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
You could use it like
print insertAfter("Hello World", "lo", " beautiful") # prints 'Hello beautiful world'
Here is a suggestion to deal with files, I suppose the pattern you search is a whole line (there is nothing more on the line than the pattern and the pattern fits on one line).
line = ... # What to match
input_filepath = ... # input full path
output_filepath = ... # output full path (must be different than input)
with open(input_filepath, "r", encoding=encoding) as fin \
open(output_filepath, "w", encoding=encoding) as fout:
pattern_found = False
for theline in fin:
# Write input to output unmodified
fout.write(theline)
# if you want to get rid of spaces
theline = theline.strip()
# Find the matching pattern
if pattern_found is False and theline == line:
# Insert extra data in output file
fout.write(all_data_to_insert)
pattern_found = True
# Final check
if pattern_found is False:
raise RuntimeError("No data was inserted because line was not found")
This code is for Python 3, some modifications may be needed for Python 2, especially the with statement (see contextlib.nested. If your pattern fits in one line but is not the entire line, you may use "theline in line" instead of "theline == line". If your pattern can spread on more than one line, you need a stronger algorithm. :)
To write to the same file, you can write to another file and then move the output file over the input file. I didn't plan to release this code, but I was in the same situation some days ago. So here is a class that insert content in a file between two tags and support writing on the input file: https://gist.github.com/Cilyan/8053594
Frerich Raabe...it worked perfectly for me...good one...thanks!!!
def insertAfter(haystack, needle, newText):
#""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
with open(sddraft) as f1:
tf = open("<path to your file>", 'a+')
# Read Lines in the file and replace the required content
for line in f1.readlines():
build = insertAfter(line, "<string to find in your file>", "<new value to be inserted after the string is found in your file>") # inserts value
tf.write(build)
tf.close()
f1.close()
shutil.copy("<path to the source file --> tf>", "<path to the destination where tf needs to be copied with the file name>")
Hope this helps someone:)

Finding errors in a file

I have a huge file whose contents are generated from running an executable over and over on different input files. The file's pattern is such: -file name followed by an arbitrary amount of text lines. I have to pick up the name of the file when there is an error in reading input data and I am not sure what the best way to do it is. Another problem is that the word error comes up every time anyway in a phrase (Final fitting error was (some numerical value)) which needs to be ignored.
C:\temptest\blahblah1
.. (arbitrary # of text lines)
Final fitting error : (some number) [I have to ignore this]
C:\temptest\blahblah2
.. (arbitrary # of text lines)
Error could not read data !** [I have to pick up blahblah2 and copy the file to another directory, but just logging the name would suffice]
Thanks in advance !
This should do more or less what you need:
f = open("your_file.txt")
file_name = None
for line in f:
if line.startswith(r"C:\"):
file_name = line
elif line.startswith("Error"):
print "Error for file " + file_name
Assumptions:
- File names will start with "C:\", if that isn't true use a regular expression to perform a more accurate match or insert a special character before new files as you mentioned in a comment.
- There will only be one error per file, or printing multiple errors for a file is not a problem. If that is not the case, set some flag when you first print an error for a file and skip all subsequent errors until you find a new file.
So your log file looks like
{filepath}\file1
{
multiple lines
}
Final fitting error : 3.2
{filepath}\file2
{
multiple lines
}
Error could not read data !
and you want a list of all filenames resulting in the 'Error could not read data' message?
import re
import os.path
skipErrs = set("Final fitting error")
saveErrs = set("Error could not read data")
LOOKFOR = re.compile('(' + '|'.join(skipErrs) + '|' + '|'.join(saveErrs) + ')')
class EOF_Exception(Exception): pass
def getLine(f):
t = f.readline()
if t=='':
raise EOF_Exception('found end of file')
else:
return t.strip()
def getFilePath(f):
return os.path.normpath(getLine(f))
errorfiles = []
with open('logfile.txt') as inf:
while True:
try:
filepath = getFilePath(inf)
s = getLine(f)
m = re.match(s)
while not m:
s = getLine(f)
m = re.match(s)
if m.group(1) in saveErrs:
errorfiles.append(filepath)
except EOF_Exception:
break
With special being whatever header you want to append to the file lines:
[line[len(special):].strip() for line in file if line.startswith(special)]
You could use regexes also, but it will be more robust to add your own header, unless you are sure arbitrary lines could not start with a valid file name.
import shutil
f=open("file")
o=open("log","a")
for line in f:
if line.lstrip().startswith("C:"):
filename = line
if "Error" in line or "error" in line:
o.write( filename +"\n")
shutil.move(line,another_directory)
f.close()
o.close()

Searching for text in a file

hi there got a couple of probs, say in my text file i have:
abase
abased
abasement
abasements
abases
This coding below is meant to find a word in a file and print all the lines to the end of the file. But it doesnt it only prints out my search term and not the rest of the file.
search_term = r'\b%s\b' % search_term
for line in open(f, 'r'):
if re.match(search_term, line):
if search_term in line:
f = 1
if f: print line,
Say i searched for abasement, i would like the output to be:
abasement
abasements
abases
My final problem is, i would like to search a file a print the lines my search term is in and a number of lines befer and after the searchterm. If i searched the text example above with 'abasement' and i defined the number of lines to print either side as 1 my output would be:
abased
abasement
abasements
numb = ' the number of lines to print either side of the search line '
search_term = 'what i search'
f=open("file")
d={}
for n,line in enumerate(f):
d[n%numb]=line.rstrip()
if search_term in line:
for i in range(n+1,n+1+numb):
print d[i%numb]
for i in range(1,numb):
print f.next().rstrip()
For the first part of the question, unindent your if f: print line,. Otherwise, you're only trying to print when the regex matches.
It's not clear to me what your question is in the second part. I see what you're trying to do, and your code, but you've not indicated how it misbehaves.
For the first part the algorithm goes like this (in pseudo code):
found = False
for every line in the file:
if line contains search term:
found = True
if found:
print line

Categories