Local Blast empty xml file python - python

I am trying to implement a little script in order to automatize a local blast alignment.
I had ran commands in the terminal en it works perfectly. However when I try to automatize this, I have a message like : Empty XML file.
Do we have to implement a "system" waiting time to let the file be written, or I did something wrong?
The code :
#sequence identifier as key, sequence as value.
for element in dictionnaryOfSequence:
#I make a little temporary fasta file because the blast command need a fasta file as input.
out_fasta = open("tmp.fasta", 'w')
query = ">" + element + "\n" + str(dictionnary[element])
out_fasta.write(query) # And I have this file with my sequence correctly filled
OUT_FASTA.CLOSE() # EDIT : It was out of my loop....
#Now the blast command, which works well in the terminal, I have my tmp.xml file well filled.
os.system("blastn -db reads.fasta -query tmp.fasta -out tmp.xml -outfmt 5 -max_target_seqs 5000")
#Parsing of the xml file.
handle = open("tmp.xml", 'r')
blast_records = NCBIXML.read(handle)
print blast_records
I have an Error : Your XML file was empty, and the blast_records object doesn't exist.
Did I make something wrong with handles?
I take all advice. Thank you a lot for your ideas and help.
EDIT : Problem solved, sorry for the useless question. I did wrong with handle and I did not open the file in the right location. Same thing with the closing.
Sorry.

try to open the file "tmp.xml" in Internet explorer. All tags are closed?

Related

Strange behavior when trying to create and write to a text file on macOS [duplicate]

This question already has answers here:
Convert UTF-8 with BOM to UTF-8 with no BOM in Python
(7 answers)
Closed last year.
I'm opening a plain text file, parsing it, and adding different lines to existing, empty string variables. I add these variables into a new variable that is a multi-line fstring. Trying to write the data to a new text file is not behaving as expected.
Reading the original file works fine. Text is properly parsed, variables populated.
The multi-line fstring variable seems fine. Prints normally. Even tried formatting it different ways which I show below.
When writing to a new file, that's where the strangeness starts. I've tried 2 ways:
Straight coding the open function with w or w+
Adding the above to a function and using that inside main()
The file is saved to disk with the correct name. Trying to double-click open in Finder produces nothing. Right-click to open produces nothing. Trying to move to trash with command+delete gives an error:
It sounds like the file goes to trash, but as the file disappears from the folder a new one is created with the same name in its place.
If I try to open in TextMate via File > Open, it opens as a blank file with no errors.
Since I can't get rid of the file, I have to delete the directory and create the directory again with the same name, or force delete in Terminal using rm. Restarting the system does not help. Relaunching Finder does nothing. Saving text files from other apps works fine. Directory is chmod 755.
If I copy an existing text file into the output directory, rename it to what the file is expected to be named, and let python overwrite the contents, it doesn't work either. The file modification date changes (and I see the file "blink" in Finder) but the contents remain the same. However, the file is not corrupted and opens normally.
If I do the same but delete the text inside of the copied file first, then run the script, python writes no data to the file, I can't open it by double-clicking on it, and I get error -43 again with the odd non-trashing behavior.
The strangest thing is this: if I add another with open() at the end of the script, and open the file that was just created and supposedly written to, and print its contents, the contents print. It's like when the script ends the file contents are being removed or its being corrupted somehow. Tried to close the file inside the script even though it's not needed, but same behavior persists.
Code:
Here's the code for writing:
FORMAT='utf-8'
OUTPUT_DIR = '/Path/To/SaveFolder'
# as a function
def write_to_file(content, fpath, name):
the_file = os.path.join(fpath, name)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(content)
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
write_to_file(multiline_var, OUTPUT_DIR, filename)
# or hard coded in main()
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
the_file = os.path.join(OUTPUT_DIR, filename)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(multiline_var)
I have tried using w w+ wt and wt+ and with and without encoding='utf-8'
Here is an example of multi-line fstring variable:
# using triple quotes
multiline_var = f"""
[PROJ-{pcode}] {full_title} by {author}
{description}
{URL}
{DIVIDER_1}
{TEXT_BLURB}
Some text here and then {SOME_MORE_TEXT}"
{DIVIDER_1}
{SOME_LINK}
"""
# or inside parens
multiline_var = (
f"[PROJ-{pcode}] {full_title} by {author}\n"
f"{description}\n\n"
f"{URL}\n"
f"{DIVIDER_1}\n"
f"{TEXT_BLURB}\n\n"
f"Some text here and then {SOME_MORE_TEXT}\n"
f"{DIVIDER_1}\n\n"
f"{SOME_LINK}"
)
Using exiftool on the text file shows the following, so it looks the data is there but must be corrupted:
File Size : 1797 bytes
File Modification Date/Time : 2021:12:31 15:55:39-05:00
File Access Date/Time : 2021:12:31 15:58:13-05:00
File Inode Change Date/Time : 2021:12:31 15:55:39-05:00
File Permissions : -rw-r--r--
File Type : TXT
File Type Extension : txt
MIME Type : text/plain
MIME Encoding : utf-8
Byte Order Mark : No
Newlines : Unix LF
Line Count : 55
Word Count : 181
Not sure what I'm doing wrong. VScode shows no syntax errors in the script. There are no errors in Terminal when running the script. Have I made some simple mistake in the above code? Maybe the fstring variable is causing a problem?
Thanks to #bnaecker for leading me to the solution to this problem.
It appeared that when creating/writing to a text file with a long name, Python can corrupt it. Not sure why, as I save long names for images with Python image libraries all the time. Using a short name like "MyFile.txt" it worked just fine, but that was a red herring.
I have updated this post with my journey to the final solution for using the long names that are needed for my project, though I'm not sure why the problem exists.
First Attempts:
So far creating using a short name and then renaming to a long one.... attempts have failed. I did notice that python is locking the file it creates and never unlocks it. Not sure if this is the problem. Setting chflags with os.system('chflags nouchg') command does not work, not even with sudo, and not even in the Terminal doing it manually.
Using os.rename() in Python corrupts the file
Using os.system('mv oldFile.txt newFile.txt') corrupts the file
Manually using mv command in Terminal corrupts the file
Manually changing the filename in the Finder does not (wtf?)
I kept looking for workarounds but nothing did the job.
Round 2:
Progress!
After much tinkering, I discovered a hidden character inside the file. I ran cat /path/longfilename.txt in Terminal, selected and copied the output and pasted into VScode. Here is what I saw:
Somehow a hidden character is getting into the project code number.
Pasting it into a Unicode search engine it came up as a ZERO WIDTH NO-BREAK SPACE also known in Unicode as EF BB BF. However, when pasting this symbol into TextMate it shows up as <U+FEFF> which is?...
The Byte Order Mark!
Opening a normal utf-8 text file in a hex editor also shows the files starting with EFBBBF for the BOM.
Now, the text file being read and parsed at first has no blank lines to start the file, so I added a line break, and also tried adding some spaces. This time when writing the file I could open it, however, after sending it to the trash, the same behavior occurred and the file was broken again. It seems that because other corrupted versions were in the trash, it added the symbol back to the file name for some reason.
So what appears to be happening, for whatever reason, when Python opens the text file I'm parsing that has no line break at the top, it seems to be grabbing the BOM from the file and adding that to the first variable which is grabbing the first line of the text file. Since that text is a number code that starts the file name, the BOM symbol is being added to the file name as well as the code inside the text file.
Just... wow
The Current Solution:
I have to leave a blank line at the start of the text file that I'm opening and parsing and a simple line break won't do it. I have no idea why this is. I added some spaces for good measure because randomly the BOM would be added to the variable and filename again. So far (knock on wood) as long as the first line of that initial file has some spaces and then a line break, and previous corrupted files have been deleted from the trash, a long file name can be used for all the files I'm creating and writing to without any problems.
This corruption even persists if I remove the encoding flag from both of the open functions I'm using (one to read and parse, the other to create and write).
If anyone knows why this is happening, please share. I've never seen it mentioned before. I'm not sure if it's a python 3.8 bug, a mac OS bug, the way TextMate wrote the original file, or a combination of these.
Correct Solution:
Thanks to #tripleee for the proper way to handle this, as I don't remember seeing this before, though I haven't been using python for very long.
In order to ignore the BOM, reading in the text file to be parsed with an encoding='utf-8-sig' does the job. Seems to be why it exists. :)
Problem solved.

Why does python keep throwing 'source code string cannot contain null bytes'

I'm running a program that reads a Dictionary off a file called questions.txt.
{
1:{0:'question',1:{1:'answer1a',2:'answer1b'},2:'answer2'},
2:{0:'question',1:'answer1',2:'answer2'},
3:{0:'question',1:'answer1',2:'answer2'},
4:{0:'question',1:'answer1',2:'answer2'}
}
I'm using this code from my file dictict.pyw to read parts of the Dictionary.
fQDict = open("questions.txt", "r")
QDict = " "
for i in fQDict.read().split():
QDict += i
fQDict.close()
QDict = eval(QDict)
print(QDict)
print(QDict[1])
print(QDict[1][0])
print(QDict[1][1])
print(QDict[1][1][1])
When I run the program python throws an error saying source code string cannot contain null bytes at the QDict = eval(QDict) line, why?
Your file contains null byte characters
sed -i 's/\x0//g' null.txt
This should help to remove these characters from your file.
for more reference see: https://stackoverflow.com/a/2399817/230468
Thank you to everyone for the advise and answers.
The solution i found was to change the file from wordpad to notepad, since wordpad embeds code into the file.

Tailing a log file

I want to add a log viewer tab to my website. The tab is supposed to print the whole log file, and after that print new lines (such as tail -F command in Linux) only. The client Side is in HTML and Javascript, and the server side is in Python.
Here is my tail Python function (I found it in the web):
#cherrypy.expose
def tail(self):
filename = '/opt/abc/logs/myLogFile.log'
f = subprocess.Popen(['tail','-F',filename],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
print f.stdout.readline()
time.sleep(1)
This code is indeed printing the whole log file. However, each time I add new lines to the file, the file has been printed from the beginning, instead of printing the new lines.
Any suggestions how to fix it? I'm pretty new in Python, so I would appreciate any kind of help.
Check out pytailer
https://github.com/six8/pytailer
Specifically the follow command:
# Follow the file as it grows
for line in tailer.follow(open('/opt/abc/logs/myLogFile.log')):
print line

How Can I Remove Skipped Lines from Pastebin Output?

I am trying to use Pastebin to host two text files for me to allow any copy of my script to update itself through the internet. My code is working, but the resultant .py file has a blank line added between each line. Here is my script...
import os, inspect, urllib2
runningVersion = "1.00.0v"
versionUrl = "http://pastebin.com/raw.php?i=3JqJtUiX"
codeUrl = "http://pastebin.com/raw.php?i=GWqAQ0Xj"
scriptFilePath = (os.path.abspath(inspect.getfile(inspect.currentframe()))).replace("\\", "/")
def checkUpdate(silent=1):
# silently attempt to update the script file by default, post messages if silent==0
# never update if "No_Update.txt" exists in the same folder
if os.path.exists(os.path.dirname(scriptFilePath)+"/No_Update.txt"):
return
try:
versionData = urllib2.urlopen(versionUrl)
except urllib2.URLError:
if silent==0:
print "Connection failed"
return
currentVersion = versionData.read()
if runningVersion!=currentVersion:
if silent==0:
print "There has been an update.\nWould you like to download it?"
try:
codeData = urllib2.urlopen(codeUrl)
except urllib2.URLError:
if silent==0:
print "Connection failed"
return
currentCode = codeData.read()
with open(scriptFilePath.replace(".py","_UPDATED.py"), mode="w") as scriptFile:
scriptFile.write(currentCode)
if silent==0:
print "Your program has been updated.\nChanges will take effect after you restart"
elif silent==0:
print "Your program is up to date"
checkUpdate()
I stripped the GUI (wxpython) and set the script to update another file instead of the actual running one. The "No_Update" bit is for convenience while working.
I noticed that opening the resultant file with Notepad does not show the skipped lines, opening with Wordpad gives a jumbled mess, and opening with Idle shows the skipped lines. Based on that, this seems to be a formatting problem even though the "raw" Pastebin file does not appear to have any formatting.
EDIT: I could just strip all blank lines or leave it as is without any problems, (that I've noticed) but that would greatly reduce readability.
Try adding the binary qualifier in your open():
with open(scriptFilePath.replace(".py","_UPDATED.py"), mode="wb") as scriptFile:
I notice that your file on pastebin is in DOS format, so it has \r\n in it. When you call scriptFile.write(), it translates \r\n to \r\r\n, which is terribly confusing.
Specifying "b" in the open() will cause scriptfile to skip that translate and write the file is DOS format.
In the alternative, you could ensure that the pastebin file has only \n in it, and use mode="w" in your script.

Find a line of text with a pattern using Windows command line or Python

I need to run a command line tool that verifies a file and displays a bunch of information about it. I can export this information to a txt file but it includes a lot of extra data. I just need one line for the file:
"The signature is timestamped: Thu May 24 17:13:16 2012"
The time could be different, but I just need to extract this data and put it into a file. Is there a good way to do this from the command line itself or maybe python? I plan on using Python to locate and download the file to be verified, then run the command line tool to verify it so it can get the data then send that data in an email.
This is on a windows PC.
Thanks for your help
You don't need to use Python to do this. If you're using a Unix environment, you can use fgrep right from the command-line and redirect the output to another file.
fgrep "The signature is timestamped: " input.txt > output.txt
On Windows you can use:
find "The signature is timestamped: " < input.txt > output.txt
You mention the command line utility "displays" some information, so it may well be printing to stdout, so one way is to run the utility within Python, and capture the output.
import subprocess
# Try with some basic commands here maybe...
file_info = subprocess.check_output(['your_command_name', 'input_file'])
for line in file_info.splitlines():
# print line here to see what you get
if file_info.startswith('The signature is timestamped: '):
print line # do something here
This should fit in nicely with the "use python to download and locate" - so that can use urllib.urlretrieve to download (possibly with a temporary name), then run the command line util on the temp file to get the details, then the smtplib to send emails...
In python you can do something like this:
timestamp = ''
with open('./filename', 'r') as f:
timestamp = [line for line in f.readlines() if 'The signature is timestamped: ' in line]
I haven't tested this but I think it'd work. Not sure if there's a better solution.
I'm not too sure about the exact syntax of this exported file you have, but python's readlines() function might be helpful for this.
h=open(pathname,'r') #opens the file for reading
for line in h.readlines():
print line#this will print out the contents of each line of the text file
If the text file has the same format every time, the rest is easy; if it is not, you could do something like
for line in h.readlines():
if line.split()[3] == 'timestamped':
print line
output_string=line
as for writing to a file, you'll want to open the file for writing, h=open(name, "w"), then use h.write(output_string) to write it to a text file

Categories