Muscle alignment in python - python

I have a problem with printing my output from muscle aligning in python. My code is:
from Bio.Align.Applications import MuscleCommandline
from StringIO import StringIO
from Bio import AlignIO
def align_v1 (Fasta):
muscle_cline = MuscleCommandline(input="hiv_protease_sequences_w_wt.fasta")
stdout, stderr = muscle_cline()
MultipleSeqAlignment = AlignIO.read(StringIO(stdout), "fasta")
print MultipleSeqAlignment
Any help?

It would be nice to know what error you received, but the following should solve your problem:
from Bio.Align.Applications import MuscleCommandline
from StringIO import StringIO
from Bio import AlignIO
muscle_exe = r"C:\muscle3.8.31_i86win32.exe" #specify the location of your muscle exe file
input_sequences = "hiv_protease_sequences_w_wt.fasta"
output_alignment = "output_alignment.fasta"
def align_v1 (Fasta):
muscle_cline = MuscleCommandline(muscle_exe, input=Fasta, out=output_alignment)
stdout, stderr = muscle_cline()
MultipleSeqAlignment = AlignIO.read(output_alignment, "fasta")
print MultipleSeqAlignment
align_v1(input_sequences)
In my case I received a ValueError:
>>> AlignIO.read(StringIO(stdout), "fasta")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\WinPython-64bit-3.3.2.3\python-3.3.2.amd64\lib\site-packages\Bio\AlignIO\__init__.py", line 427, in read
raise ValueError("No records found in handle")
ValueError: No records found in handle
This could be avoided by saving the output and reopening with AlignIO.read.
I also received a FileNotFoundError that could be avoided by specifying the location of the muscle exe file. eg:
muscle_exe = r"C:\muscle3.8.31_i86win32.exe"
The instructions for this are shown in help(MuscleCommandline), but this is not currently in the Biopython tutorial page.
Finally, I am assuming you want to run the command using different input sequences, so I modifed the function to the format “function_name(input_file).”
I used python 3.3. Hopefully the code above is for python 2.x as in your original post. For python 3.x, change "from StringIO import StringIO" to "from io import StringIO" and of course “print MultipleSeqAlignment” to “print(MultipleSeqAlignment)”.

Related

Import own python modules in nextflow script block?

I created a python script called utilities.py in bin/ directory:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
from datetime import datetime
import io
def print_info(in_df, fname_base):
buffer = io.StringIO()
df = in_df.copy()
df.info(buf=buffer)
s = buffer.getvalue()
with open(fname_base+"_info.txt", "w", encoding="utf-8") as f:
f.write(s)
def print_desc(in_df, fname_base):
df = in_df.copy()
desc = df.describe()
desc.to_csv(fname_base+"_desc.tsv", sep = '\t')
def print_data(in_df, fname_base):
df = in_df.copy()
print_info(df, fname_base)
print_desc(df, fname_base)
df.to_csv(fname_base+".tsv", sep = '\t')
and made it executable with chmod +x. I would like to use these functions in a several script blocks in various processes in my workflow. Currently when I try importing a function from my utilities module:
#!/bin/bash nextflow
process transform_data {
input:
path(data)
output:
path("out.tsv"), emit: out_data
script:
"""
#!/usr/bin/env python3
import pandas as pd
import io
from utilities import print_info
"""
}
I get the following error:
Traceback (most recent call last):
File ".command.sh", line 4, in <module>
from utilities import print_info
ModuleNotFoundError: No module named 'utilities'
Is it possible to import own modules in this way?
Which version of Nextflow are you using?
I tested with v22.04.5 and the following works:
My setup is little bit different, instead of specifying #!/usr/bin/env python3, I directly invoked a python script (test.py) which has from utilities import print_info inside it, and it works fine.
script:
"""
test.py
"""
Note that the following won't work: from .utilities import print_info. Therefore, you can import custom Python module with Nextflow.

Unable to read dicom file with Python3 and pydicom

I trying to read dicom file with python3 and pydicom library. For some dicom data, I can't get data correctly and get error messages when I tried to print the result of pydicom.dcmread.
However, I have tried to use python2 and it worked well. I checked out the meta information and compared it with other dicom files which can be processed, I didn't find any difference between them.
import pydiom
ds = pydicom.dcmread("xxxxx.dicom")
print(ds)
Traceback (most recent call last):
File "generate_train_data.py", line 387, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "generate_train_data.py", line 371, in main
create_ann()
File "generate_train_data.py", line 368, in create_ann
ds_ann_dir, case_name, merge_channel=False)
File "generate_train_data.py", line 290, in process_dcm_set
all_dcms, dcm_truth_infos = convert_dicoms(dcm_list, zs)
File "generate_train_data.py", line 179, in convert_dicoms
instance_num, pixel_spacing, img_np = extract_info(dcm_path)
File "generate_train_data.py", line 147, in extract_info
print(ds)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2277-2279: ordinal not in range(128)
Anyone has come across the same problem?
Can you give an example for such a dicom file? When running the pydicom example with python 3.7 it's working perfectly:
import matplotlib.pyplot as plt
import pydicom
from pydicom.data import get_testdata_files
filename = get_testdata_files("CT_small.dcm")[0]
ds = pydicom.dcmread(filename)
plt.imshow(ds.pixel_array, cmap=plt.cm.bone)
It's also working with the sample dicom files from the Medical Image Samples.
I believe the cause of the problem is that Python (for me it only happened in Python 3 running on Centos 7.6 Linux printing to a terminal window on MacOS) is not able to figure out how to print a string that contains a non-ascii character because of the setting of the locale. You can use the locale command to see the results. Mine started out with everything set to "C". I set the LANG environment variable to en_US.UTF-8. With that setting it worked for me.
In csh this is done using
setenv LANG en_US.UTF-8
In bash use:
export LANG=en_US.UTF-8
My problem resulted from having 'µ' in the Series Description element. The file was an attenuation map from a SPECT reconstruction on a Siemens scanner. I used the following Python code to help figure out the problem.
#! /usr/bin/env python3
import pydicom as dicom
from sys import exit, argv
def myprint(ds, indent=0):
"""Go through all items in the dataset and print them with custom format
Modelled after Dataset._pretty_str()
"""
dont_print = ['Pixel Data', 'File Meta Information Version']
indent_string = " " * indent
next_indent_string = " " * (indent + 1)
for data_element in ds:
if data_element.VR == "SQ": # a sequence
print(indent_string, data_element.name)
for sequence_item in data_element.value:
myprint(sequence_item, indent + 1)
print(next_indent_string + "---------")
else:
if data_element.name in dont_print:
print("""<item not printed -- in the "don't print" list>""")
else:
repr_value = repr(data_element.value)
if len(repr_value) > 50:
repr_value = repr_value[:50] + "..."
try:
print("{0:s} {1:s} = {2:s}".format(indent_string,
data_element.name,
repr_value))
except:
print(data_element.name,'****Error printing value')
for f in argv[1:]:
ds = dicom.dcmread(f)
myprint(ds, indent=1)
This is based on the myprint function from]1
The code tries to print out all the data items. It catches exceptions and prints "****Error printing value" when there is an error.

Python: capturing both of sys.stdout and sys.stderr as a log file

Roughly speaking, I want to port this to pure Python:
#!/bin/bash
{
python test.py
} &> /tmp/test.log
This didn't work for some unknown reasons:
import os.path, sys
import tempfile
with open(os.path.join(tempfile.gettempdir(), "test.log"), "a") as fp:
sys.stdout = sys.stderr = fp
raise Exception("I'm dying")
The resulting test.log was empty (and I didn't see anything on my console,) when I tested it with Python 2.6.6, Python 2.7.8 and Python 3.4.2 on CentOS x86_64.
But Ideally I'd like a solution for Python 2.6.
(For now, it's tolerable to clutter the log file with intermixed output from stdout and stderr or multithreading, as long as any data won't simply disappear into a blackhole.)
Show me a concise and portable solution which is confirmed to work with an exception stack trace on sys.stderr. (Preferably something other than os.dup2)
Remember that file objects are closed after with blocks :)
Use simply this:
sys.stdout = sys.stderr = open("test.log","w")
raise Exception("Dead")
Content of test.log after exit:
Traceback (most recent call last):
File "test.py", line 5, in <module>
raise Exception("Dead")
Exception: Dead
You can use a method like this one:
import traceback
import sys
from contextlib import contextmanager
#contextmanager
def output_to_file(filepath, write_mode='w'):
stdout_orig = None
stderr_orig = None
stdout_orig = sys.stdout
stderr_orig = sys.stderr
f = open(filepath, write_mode)
sys.stdout = f
sys.stderr = f
try:
yield
except:
info = sys.exc_info()
f.write('\n'.join(traceback.format_exception(*info)))
f.close()
sys.stdout = stdout_orig
sys.stderr = stderr_orig
And the the usage is:
with output_to_file('test.log'):
print('hello')
raise Exception('I am dying')
And the cat test.log produces:
hello
Traceback (most recent call last):
File "<ipython-input-3-a3b702c7b741>", line 20, in outputi_to_file
yield
File "<ipython-input-4-f879d82580b2>", line 3, in <module>
raise Exception('I am dying')
Exception: I am dying
This works for me:
#!/usr/bin/env python
from __future__ import print_function
import os, os.path, sys, tempfile
old_out = os.dup(sys.stdout.fileno())
old_err = os.dup(sys.stderr.fileno())
with open(os.path.join(tempfile.gettempdir(), "test.log"), "a") as fp:
fd = fp.fileno()
os.dup2(fd, sys.stdout.fileno())
os.dup2(fd, sys.stderr.fileno())
print("Testing")
print('testing errs', file=sys.stderr)
raise Exception("I'm dying")
The future is just for cleaner handling of Python2 or Python3 with the same example. (I've also changed the raise statement to instantiate an exception, strings as exceptions have been deprecated for a long time and they're not properly supported under Python3).
The old_* values are just if we wanted to restore our original stdout and/or stderr after using our redirected file.

pandas.DataFrame.load/save between python2 and python3: pickle protocol issues

I haven't figure out how to do pickle load/save's between python 2 and 3 with pandas DataFrames. There is a 'protocol' option in the pickler that I've played with unsuccessfully but I'm hoping someone has a quick idea for me to try. Here is the code to get the error:
python2.7
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
ValueError: unsupported pickle protocol: 3
python3
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)
Maybe expecting pickle to work between python version is a bit optimistic?
I had the same problem. You can change the protocol of the dataframe pickle file with the following function in python3:
import pickle
def change_pickle_protocol(filepath,protocol=2):
with open(filepath,'rb') as f:
obj = pickle.load(f)
with open(filepath,'wb') as f:
pickle.dump(obj,f,protocol=protocol)
Then you should be able to open it in python2 no problem.
If somebody uses pandas.DataFrame.to_pickle() then do the following modification in source code to have the capability of pickle protocol setting:
1) In source file /pandas/io/pickle.py (before modification copy the original file as /pandas/io/pickle.py.ori) search for the following lines:
def to_pickle(obj, path):
pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
Change these lines to:
def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):
pkl.dump(obj, f, protocol=protocol)
2) In source file /pandas/core/generic.py (before modification copy the original file as /pandas/core/generic.py.ori) search for the following lines:
def to_pickle(self, path):
return to_pickle(self, path)
Change these lines to:
def to_pickle(self, path, protocol=None):
return to_pickle(self, path, protocol)
3) Restart your python kernel if it runs then save your dataframe using any available pickle protocol (0, 1, 2, 3, 4):
# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)
# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')
4) After pandas upgrade, repeat step 1 & 2.
5) (optional) Ask the developers to have this capability in official releases (because your code will throw exception on any other Python environments without these changes)
Nice day!
You can override the highest protocol available for the pickle package:
import pickle as pkl
import pandas as pd
if __name__ == '__main__':
# this constant is defined in pickle.py in the pickle package:"
pkl.HIGHEST_PROTOCOL = 2
# 'foo.pkl' was saved in pickle protocol 4
df = pd.read_pickle(r"C:\temp\foo.pkl")
# 'foo_protocol_2' will be saved in pickle protocol 2
# and can be read in pandas with Python 2
df.to_pickle(r"C:\temp\foo_protocol_2.pkl")
This is definitely not an elegant solution but it does the work without changing pandas code directly.
UPDATE: I found that the newer version of pandas, allow to specify the pickle version in the .to_pickle function:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1]
DataFrame.to_pickle(path, compression='infer', protocol=4)

python 2.7 / exec / what is wrong?

I have this code which runs fine in Python 2.5 but not in 2.7:
import sys
import traceback
try:
from io import StringIO
except:
from StringIO import StringIO
def CaptureExec(stmt):
oldio = (sys.stdin, sys.stdout, sys.stderr)
sio = StringIO()
sys.stdout = sys.stderr = sio
try:
exec(stmt, globals(), globals())
out = sio.getvalue()
except Exception, e:
out = str(e) + "\n" + traceback.format_exc()
sys.stdin, sys.stdout, sys.stderr = oldio
return out
print "%s" % CaptureExec("""
import random
print "hello world"
""")
And I get:
string argument expected, got 'str'
Traceback (most recent call last):
File "D:\3.py", line 13, in CaptureExec
exec(stmt, globals(), globals())
File "", line 3, in
TypeError: string argument expected, got 'str'
io.StringIO is confusing in Python 2.7 because it's backported from the 3.x bytes/string world. This code gets the same error as yours:
from io import StringIO
sio = StringIO()
sio.write("Hello\n")
causes:
Traceback (most recent call last):
File "so2.py", line 3, in <module>
sio.write("Hello\n")
TypeError: string argument expected, got 'str'
If you are only using Python 2.x, then skip the io module altogether, and stick with StringIO. If you really want to use io, change your import to:
from io import BytesIO as StringIO
It's bad news
io.StringIO wants to work with unicode. You might think you can fix it by putting a u in front of the string you want to print like this
print "%s" % CaptureExec("""
import random
print u"hello world"
""")
however print is really broken for this as it causes 2 writes to the StringIO. The first one is u"hello world" which is fine, but then it follows with "\n"
so instead you need to write something like this
print "%s" % CaptureExec("""
import random
sys.stdout.write(u"hello world\n")
""")

Categories