Issue handling file from command line with biopython SeqIO - python

This is my first attempt at using commandline args other than the quick and dirty sys.argv[] and writing a more 'proper' python script. For some reason that I can now not figure out, it seems to be objecting to how I'm trying to use the input file from the command line.
The script is meant to take an input file, some numerical indices, and then slice out a subset region of the file, however I keep getting errors that the variable I've given to the file I'm passing in is not defined:
joehealey#7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
File "slice_genbank.py", line 70, in <module>
sub_record = record[start:end]
NameError: name 'record' is not defined
Here's the code, where am I going wrong? (I'm sure its simple):
#!/usr/bin/python
# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.
# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44
# Set up and handle arguments:
from Bio import SeqIO
import getopt
def main(argv):
record = ''
start = ''
end = ''
try:
opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
'help',
'input=',
'outfile=',
'start=',
'end='
]
)
if not opts:
print "No options supplied. Aborting."
usage()
sys.exit(2)
except getopt.GetoptError:
print "Some issue with commandline args.\n"
usage()
sys.exit(2)
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit(2)
elif opt in ("-i", "--input"):
filename = arg
record = SeqIO.read(arg, "genbank")
elif opt in ("-o", "--outfile"):
outfile = arg
elif opt in ("-s", "--start"):
start = arg
elif opt in ("-e", "--end"):
end = arg
print("Slicing " + filename + " from " + str(start) + " to " + str(end))
def usage():
print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.
Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"
Options:
-h|--help Displays this usage message. No options will also do this.
-i|--input The genbank file you which to subset a record from.
-o|--outfile The file name you wish to give to the new sliced genbank.
-s|--start An integer base index to slice the record from.
-e|--end An integer base index to slice the record to.
"""
)
#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")
if __name__ == "__main__":
main(sys.argv[1:])
It's also possible there's an issue with the SeqIO.write syntax, but I haven't got as far as that yet.
EDIT:
Also forgot to mention that when I use `record = SeqIO.read("file.gbk", "genbank") and write the file name directly in to the script, it works correctly.

As said in the comments, your variable records is only defined in the method main() (the same is true for start and end), thus it is not visible for the rest of the program.
You can either return the values like this:
def main(argv):
...
...
return record, start, end
Your call to main() can then look like this:
record, start, end = main(sys.argv[1:])
Alternatively, you can move your main functionality into the main function (as you did).
(Another way is to define the variables in the main program and the use the global keyword in your function, this is, however, not recommended.)

Related

How to print the first N lines of a file in python with N as argument

How would I go about getting the first N lines of a text file in python? With N have to give as argument
usage:
python file.py datafile -N 10
My code
import sys
from itertools import islice
args = sys.argv
print (args)
if args[1] == '-h':
print ("-N for printing the number of lines: python file.py datafile -N 10")
if args[-2] == '-N':
datafile = args[1]
number = int(args[-1])
with open(datafile) as myfile:
head = list(islice(myfile, number))
head = [item.strip() for item in head]
print (head)
print ('\n'.join(head))
I wrote the program, can let me know better than this code
Assuming that the print_head logic you've implemented need not be altered, here's the script I think you're looking for:
import sys
from itertools import islice
def print_head(file, n):
if not file or not n:
return
with open(file) as myfile:
head = [item.strip() for item in islice(myfile, n)]
print(head)
def parse_args():
result = {'script': sys.argv[0]}
args = iter(sys.argv)
for arg in args:
if arg == '-F':
result['filename'] = next(args)
if arg == '-N':
result['num_lines'] = int(next(args))
return result
if __name__ == '__main__':
script_args = parse_args()
print_head(script_args.get('filename', ''), script_args.get('num_lines', 0))
Running the script
python file.py -F datafile -N 10
Note: The best way to implement it would be to use argparse library
You can access argument passed to the script through sys
sys.argv
The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string '-c'. If no script name was passed to the Python interpreter, argv[0] is the empty string.
So in code it would look like this:
import sys
print("All of argv")
print(sys.argv)
print("Last element every time")
print(sys.argv[-1])
Reading the documentation you'll see that the first values stored in the sys.argv vary according to how the user calls the script. If you print the code I pasted with different types of calls you can see for yourself the kind of values stored.
For a basic first approach: access n through sys.argv[-1] which returns the last element every time, assuming. You still have to do a try and beg for forgiveness to make sure the argument passed is a number. For that you would have:
import sys
try:
n = int(sys.argv[-1])
except ValueError as v_e:
print(f"Please pass a valid number as argument, not ${sys.argv[-1]}")
That's pretty much it. Obviously, it's quite basic, you can improve this even more by having the users pass values with flags, like --skip-lines 10 and that would be your n, and it could be in any place when executing the script. I'd create a function in charge of translating sys.argv into a key,value dictionary for easy access within the script.
Arguments are available via the sys package.
Example 1: ./file.py datafile 10
#!/usr/bin/env python3
import sys
myfile = sys.argv[1]
N = int(sys.argv[2])
with open("datafile") as myfile:
head = myfile.readlines()[0:args.N]
print(head)
Example 2: ./file.py datafile --N 10
If you want to pass multiple optional arguments you should have a look at the argparse package.
#!/usr/bin/env python3
import argparse
parser = argparse.ArgumentParser(description='Read head of file.')
parser.add_argument('file', help='Textfile to read')
parser.add_argument('--N', type=int, default=10, help='Number of lines to read')
args = parser.parse_args()
with open(args.file) as myfile:
head = myfile.readlines()[0:args.N]
print(head)

How to dynamically import variables after executing python script from within another script

I want to extract a variable named value that is set in a second, arbitrarily chosen, python script.
The process works when do it manually in pyhton's interactive mode, but when I run the main script from the command line, value is not imported.
The main script's input arguments are already successfully forwarded, but value seems to be in the local scope of the executed script.
I already tried to define value in the main script, and I also tried to set its accessibility to global.
This is the script I have so far
import sys
import getopt
def main(argv):
try:
(opts, args) = getopt.getopt(argv, "s:o:a:", ["script=", "operations=", "args="])
except getopt.GetoptError as e:
print(e)
sys.exit(2)
# script to be called
script = ""
# arguments that are expected by script
operations = []
argv = []
for (opt, arg) in opts:
if opt in ("-o", "--operations"):
operations = arg.split(',')
print("operations = '%s'" % str(operations))
elif opt in ("-s", "--script"):
script = arg;
print("script = '%s'" % script)
elif opt in ("-a", "--args"):
argv = arg.split(',')
print("arguments = '%s'" % str(argv))
# script should define variable 'value'
exec(open(script).read())
print("Executed '%s'. Value is printed below." % script)
print("Value = '%s'" % value)
if __name__ == "__main__":
main(sys.argv[1:])
The value variable has been put into your locals dictionary by the exec, but was not visible to the compiler. You can retrieve it like this:
print("Value = '%s'" % locals()['value'])
I would prefer an import solution
Using locals() as #cdarke suggested yielded the correct result!
exec(open(script).read())
print("Executed '%s'. Value is printed below." % script)
print("Value = '%s'" % locals()['value'])
In case your import needs to be dynamic, you can use
impmodule = __import__("modulename") # no .py suffix needed
then refer to value via
impmodule.value
There are several ways to achieve the same results.
See the answers on this topic on SO

I'm getting the following error and can't work it out

I'm learning Python and started putting together the below code. I'm trying to get the fread function working correctly but I'm getting an error.
I've tried a few ways to fix it but of course if I don't know what's causing it I'm never going to fix it.
I'm hoping someone could help me out.
Error
unknown#ubuntu:~$ ./attack.py -f wordfile.txt
Traceback (most recent call last):
File "./attack.py", line 63, in <module>
print fread(list)
File "./attack.py", line 20, in fread
flist = open(list).readlines()
TypeError: coercing to Unicode: need string or buffer, type found`
CODE
#!/usr/bin/python
import sys, getopt, socket, fileinput, traceback
import dns.query, dns.message, dns.name, adns
from Queue import Queue
from threading import Thread
def usage():
print "-h --help: help\n"
print "-f --file: File to read bruteforce domain list from.\n"
print "-p --proxy: Proxy address and port. e.g http://192.168.1.64:8080\n"
print "-d --domain: Domain to bruteforce.\n"
print "-t --thread: Thread count.\n"
print "-e: Turn debug on.\n"
sys.exit()
def fread(list, *args):
flist = open(list).readlines()
return flist
def addcheck(fcontent):
data =[]
c=adns.init()
for sub in file:
SubDomain = fcontent + domain
data[SubDomain] = c.synchronous(SubDomain, adns.rr.A)
return data
def main(argv):
list = None
proxy = None
domain = None
FILE= None
try:
opts, argv =getopt.getopt(argv, "h:f:p:d:t:e",["help", "file=", "proxy=", "domain=", "thread="])
except getopt.GetoptError as err:
print str(err)
usage()
sys.exit(2)
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit()
elif opt in ("-f", "--file"):
list = arg
elif opt in ("-p", "--proxy"):
proxy = arg
elif opt in ("-d", "--domain"):
domain = arg
elif opt in ("-t", "--thread"):
thread = arg
elif opt in '-e':
global _debug
print fread(list)
if __name__ == "__main__":
main(sys.argv[1:])
You are passing in the list type object here:
print fread(list)
This is outside of the main function, so list is still bound to the built-in type.
You probably meant that line to be part of the main() function. If so, indent it further to match the rest of the code in the function.
You really should not use list as a variable name, however. There already is a built-in type of that name; as a result your indentation error gave you a confusing exception message. Perhaps fname or filename would have been a better choice.

how to set optional input parameter in python

I have the following snippet where i am checking for first argument and runninginto following error..can anyone help on how to make the first argument optional?
SNIPPET CODE:-
branch = ''
if sys.argv[1]:
branch = sys.argv[1]
ERROR:-
Traceback (most recent call last):
File "test.py", line 102, in <module>
main()
File "test.py", line 66, in main
if sys.argv[1]:
IndexError: list index out of range
For inputting parameters into python you can use the getopt module. Here parameters can be optional and can be inputted in any order as long at the correct flag is present.
In the example below the user has two optional parameters to set, the input-file name and the database name. The code can be called using
python example.py -f test.txt -d HelloWorld
or
python example.py file=test.txt database=HelloWorld
or a mix and match of both.
The flags and names can be changed to reflect your needs.
import getopt
def main(argv):
inputFileName = ''
databaseName = ''
try:
opts, args = getopt.getopt(argv,"f:d:",["file=","database="])
except getopt.GetoptError:
print('-f <inputfile> -d <databasename> -c <collectionname>')
sys.exit()
for opt, arg in opts:
if opt in ('-f','--file'):
inputFileName = arg
elif opt in ('-d','--database'):
databaseName = arg
if __name__ == "__main__":
main(sys.argv[1:])
Use exception handling(EAFP):
try:
branch = sys.argv[1]
except IndexError:
branch = ''
You can use:
branch = sys.argv[1] if len(sys.argv) >= 2 else ''

Python optparse not seeing argument

I am trying to pass '-f nameoffile' to the program when I call it from the command line. I got this from the python sites documentation but when I pass '-f filename' or '--file=filename' it throws the error that I didnt pass enough arguments. If i pass -h the programs responds how it should and gives me the help. Any ideas? I imagine its something simple that I am overlooking. Any and all help is great, thanks, Justin.
[justin87#el-beasto-loco python]$ python openall.py -f chords.tar
Usage: openall.py [options] arg
openall.py: error: incorrect number of arguments
[justin87#el-beasto-loco python]$
#!/usr/bin/python
import tarfile
import os
import zipfile
from optparse import OptionParser
def check_tar(file):
if tarfile.is_tarfile(file):
return True
def open_tar(file):
try:
tar = tarfile.open(file)
tar.extractall()
tar.close()
except tarfile.ReadError:
print "File is somehow invalid or can not be handled by tarfile"
except tarfile.CompressionError:
print "Compression method is not supported or data cannot be decoded"
except tarfile.StreamError:
print "Is raised for the limitations that are typical for stream-like TarFile objects."
except tarfile.ExtractError:
print "Is raised for non-fatal errors when using TarFile.extract(), but only if TarFile.errorlevel== 2."
def check_zip(file):
if zipfile.is_zipfile(file):
return True
def open_zip(file):
try:
zip = zipfile.ZipFile(file)
zip.extractall()
zip.close()
#open the zip
print "GOT TO OPENING"
except zipfile.BadZipfile:
print "The error raised for bad ZIP files (old name: zipfile.error)."
except zipfile.LargeZipFile:
print "The error raised when a ZIP file would require ZIP64 functionality but that has not been enabled."
rules = ((check_tar, open_tar),
(check_zip, open_zip)
)
def checkall(file):
for checks, extracts in rules:
if checks(file):
return extracts(file)
def main():
usage = "usage: %prog [options] arg"
parser = OptionParser(usage)
parser.add_option("-f", "--file", dest="filename",
help="read data from FILENAME")
(options, args) = parser.parse_args()
if len(args) != 1:
parser.error("incorrect number of arguments")
file = options.filename
checkall(file)
if __name__ == '__main__':
main()
Your problem is probably the if len(args) != 1:. That is looking for an additional argument (i.e. not an option). If you remove that check and look at your options dictionary you should see {'filename': 'blah'}.
Your input filename isn't an option to the program, it's an argument:
def main():
usage = "Usage: %prog [options] FILE"
description = "Read data from FILE."
parser = OptionParser(usage, description=description)
(options, args) = parser.parse_args()
if len(args) != 1:
parser.error("incorrect number of arguments")
file = args[0]
checkall(file)
You can usually tell the difference because options generally have sensible defaults while arguments don't.
After parsing the options out of the argument list, you check that you were passed an argument. This is independent of the argument to -f. It sounds like you're just not passing this argument. Since you also don't actually use this argument, you should probably just remove the check on len(args).
You should set the 'action' attribute in the 'add_option()' method to 'store', this tells the optparse object to store the argument immediately following the option flag, though this is the default behavior. The value following the flag will then be stored in 'options.filename' and not in args. I also think that the
if len(args) != 1:
is also an issue, you will get the same message if len(args) is greater than or less than 1.

Categories