Can't get output from Tesseract command run through os.system - python

I've created a function which loops over images and gets the orientation from the image with the tesseract library. The code looks like this:
def fix_incorrect_orientation(pathName):
for filename in os.listdir(pathName):
tesseractResult = str(os.system('tesseract ' + pathName + '/' + filename + ' - -psm 0'))
print('tesseractResult: ' + tesseractResult)
regexObj = re.search('([Orientation:]+[\s][0-9]{1})',tesseractResult)
if regexObj:
orientation = regexObj.groups(0)[0]
print('orientation123: ' + str(orientation))
else:
print('Not getting in the Regex.')
The result from the variable tesseractResult is always 0 though. But in the terminal I will get the following result from the command:
Orientation: 3
Orientation in degrees: 90
Orientation confidence: 19.60
Script: 1
Script confidence: 21.33
I've tried catching the output from the os.system in multiple ways, such as with Popen and subprocess but without any succes. It seems that I can't catch the output from the tesseract library.
So, how exactly should I do this?
Thanks,
Yenthe

Literally 10 minutes after asking the question I found a way.. First import commands:
import commands
And then the following code will do the trick:
def fix_incorrect_orientation(pathName):
for filename in os.listdir(pathName):
tesseractResult = str(commands.getstatusoutput('tesseract ' + pathName + '/' + filename + ' - -psm 0'))
print('tesseractResult: ' + tesseractResult)
regexObj = re.search('([Orientation:]+[\s][0-9]{1})',tesseractResult)
if regexObj:
orientation = regexObj.groups(0)[0]
print('orientation123: ' + str(orientation))
else:
print('Not getting in the Regex.')
This will pass the command around with the commands library and the output is caught thanks to getstatusoutput from the commands library.

Related

Create a .exe that takes input

I'm trying to convert a python .py function (that takes two inputs) into a .exe file to execute (always in python).
The function looks like this and works perfectly:
def ncd(File1, FileN):
..do something..
return
Now to convert this to a .exe I'm using pyinstaller through the command and also in this case everything is fine.
But then when I execute it in python it doesn't take the two inputs file that the function neeeds.
import subprocess
File1 = "Cam0001.dat"
FileN = "Cam1000.dat"
exe_ncd = "ncd.exe"
subprocess.run("\"" + exe_ncd + "\"" + " " + "\"" + File1 + "\"" + " " + "\"" + FileN + "\"", shell=True)
How can I do it? Thanks a lot :)

Snakemake: Does not want to execute my rule?

I'm a beginner with coding and Snakemake and I'm really struggling to understand my problem. Running the snakefile below will produce no error. But it does not execute the Bowtie rule. After using --dryrun it will show:
Building DAG of jobs... Nothing to be done.
My guess would be that I mixed something up with the wildcards and Snakemake thinks the file already exist so it does not execute the rule at all. The rule does work when I hard code it. I tried to change the wildcards but can't get it to run.
#Snakefile
configfile: "../../config/config.yaml"
INPUTDIR = str(config["paths"]["input"])
OUTPUTDIR = str(config["paths"]["output"])
FILE_FORMAT = str(config["paths"]["file_format"])
DATABASEPATH = str(config["database"]["Bowtie_Database"])
(SAMPLES, NUMBERS) = glob_wildcards(INPUTDIR + "/{sample}_{number, [1,2]}." + FILE_FORMAT)
DATABASE, = glob_wildcards(DATABASEPATH + "/{bowtie_ref}")
#Outputfiles
rule all:
input:
#FastQC raw
expand(OUTPUTDIR + "/FastQC/raw/{sample}_{number}_fastqc.html", sample=SAMPLES,number=NUMBERS),
expand(OUTPUTDIR + "/FastQC/raw/{sample}_{number}_fastqc.zip", sample=SAMPLES,number=NUMBERS),
#Bowtie output
expand(OUTPUTDIR + "/Bowtie/{bowtie_ref}_{sample}.sam", bowtie_ref=DATABASE,sample=SAMPLES),
#macs2 output
#"/home/henri/MPI/Pipeline/Mus/results/Macs2/eg2.bed"
#######
# Q C #
#######
#Quality Control for raw data with FastQC
rule qc_raw_fastqc:
params:
threads = config["threads"]
conda:
"envs/fastqc.yml"
input:
INPUTDIR + "/{sample}_{number}." + FILE_FORMAT
output:
html = OUTPUTDIR + "/FastQC/raw/{sample}_{number}_fastqc.html",
zip = OUTPUTDIR + "/FastQC/raw/{sample}_{number}_fastqc.zip"
message:
"Doing quality control for raw reads with FastQC"
shell:
"fastqc -o {config[paths][output]}/FastQC/raw {input}"
################
## B O W T I E #
################
#mapping on ref. genome with Bowtie2
#rule Bowtie:
params:
threads = config["threads"]
conda:
"envs/fastqc.yml"
input:
expand(INPUTDIR + "/Bowtie_Database/{{bowtie_ref}}{ending}", ending=[".1.bt2",".2.bt2",".3.bt2",".4.bt2",".rev.1.bt2",".rev.2.bt2"]),
R1 = INPUTDIR + "{sample}.fastq",
R2 = INPUTDIR + "{sample}.fastq"
output:
OUTPUTDIR + "/Bowtie/{bowtie_ref}_{sample}.sam"
message:
"Alignment with Bowtie2 this will take a while"
shell:
"bowtie2 -x {INPUTDIR}/{wildcards.bowtie_ref} -1 {input.R1} -2 {input.R2} -S {output}"
Any help or Ideas would be really appreciated, thank you!
#DmitryKuzminov is probably right - the output files exist and they are newer than the input.
You can force the re-execution of rule Bowtie (and everything that depends on its output) with:
snakemake --forcerun Bowtie ...

Imagemagick's convert errors in python script with subprocess.Popen

I am trying to generate transparent background images with a python script run from the command line but I have a hard time passing all the arguments to subprocess.Popen so that Imagemagick's convert doesn't through me errors.
Here is my code:
# Import modules
import os
import subprocess as sp
# Define useful variables
fileList = os.listdir('.')
fileList.remove(currentScriptName)
# Interpret return code
def interpretReturnCode(returnCode) :
return 'OK' if returnCode is 0 else 'ERROR, check the script'
# Create background images
def createDirectoryAndBackgroundImage() :
# Ask if numbers-height or numbers-width before creating the directory
numbersDirectoryType = raw_input('Numbers directory: type "h" for "numbers-height" or "w" for "numbers-width": ')
if numbersDirectoryType == 'h' :
# Create 'numbers-height' directory
numbersDirectoryName = 'numbers-height'
numbersDirectory = interpretReturnCode(sp.call(['mkdir', numbersDirectoryName]))
print '%s%s' % ('Create "numbers-height" directory...', numbersDirectory)
# Create background images
startNumber = int(raw_input('First number for the background images: '))
endNumber = (startNumber + len(fileList) + 1)
for x in range(startNumber, endNumber) :
createNum = []
print 'createNum just after reset and before adding things to it: ', createNum, '\n'
print 'start' , x, '\n'
createNum = 'convert -size 143x263 xc:transparent -font "FreeSans-Bold" -pointsize 22 -fill \'#242325\' "text 105,258'.split()
createNum.append('\'' + str(x) + '\'"')
createNum.append('-draw')
createNum.append('./' + numbersDirectoryName + '/' + str(x) + '.png')
print 'createNum set up, createNum submittet to subprocess.Popen: ', createNum
createNumImage = sp.Popen(createNum, stdout=sp.PIPE)
createNumImage.wait()
creationNumReturnCode = interpretReturnCode(createNumImage.returncode)
print '%s%s%s' % ('\tCreate numbers image...', creationNumReturnCode, '\n')
elif numbersDirectoryType == 'w' :
numbersDirectoryName = 'numbers-width'
numbersDirectory = interpretReturnCode(sp.call(['mkdir', numbersDirectoryName]))
print '%s%s' % ('Create "numbers-width" directory...', numbersDirectory)
# Create background images
startNumber = int(raw_input('First number for the background images: '))
endNumber = (startNumber + len(fileList) + 1)
for x in range(startNumber, endNumber) :
createNum = []
print 'createNum just after reset and before adding things to it: ', createNum, '\n'
print 'start' , x, '\n'
createNum = 'convert -size 224x122 xc:transparent -font "FreeSans-Bold" -pointsize 22-fill \'#242325\' "text 105,258'.split()
createNum.append('\'' + str(x) + '\'"')
createNum.append('-draw')
createNum.append('./' + numbersDirectoryName + '/' + str(x) + '.png')
print 'createNum set up, createNum submittet to subprocess.Popen: ', createNum
createNumImage = sp.Popen(createNum, stdout=sp.PIPE)
createNumImage.wait()
creationNumReturnCode = interpretReturnCode(createNumImage.returncode)
print '%s%s%s' % ('\tCreate numbers image...', creationNumReturnCode, '\n')
else :
print 'No such directory type, please start again'
numbersDirectoryType = raw_input('Numbers directory: type "h" for "numbers-height" or "w" for "numbers-width": ')
For this I get the following errors, for each picture:
convert.im6: unable to open image `'#242325'': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `'#242325'' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `"text': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `"text' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `105,258': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `105,258' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `'152'"': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `'152'"' # error/constitute.c/ReadImage/544.
convert.im6: option requires an argument `-draw' # error/convert.c/ConvertImageCommand/1294.
I tried to change the order of the arguments without success, to use shell=True in Popen (but then the function interpretReturCode returns a OK while no image is created (number-heights folder is empty).
I would strongly recommend following the this process:
Pick a single file and directory
change the above so that sp.Popen is replaced by a print statement
Run the modified script from the command line
Try using the printed command output from the command line
Modify the command line until it works
Modify the script until it produces the command line that is exactly the same
Change the print back to sp.Popen - Then, (if you still have a problem:
Try modifying your command string to start echo convert so that
you can see what, if anything, is happening to the parameters during
the processing by sp.Popen.
There is also this handy hint from the python documents:
>>> import shlex, subprocess
>>> command_line = raw_input()
/bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'"
>>> args = shlex.split(command_line)
>>> print args
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args) # Success!

Trouble calling EMBOSS program from python

I am having trouble calling an EMBOSS program (which runs via command line) called sixpack through Python.
I run Python via Windows 7, Python version 3.23, Biopython version 1.59, EMBOSS version 6.4.0.4. Sixpack is used to translate a DNA sequence in all six reading frames and creates two files as output; a sequence file identifying ORFs, and a file containing the protein sequences.
There are three required arguments which I can successfully call from command line: (-sequence [input file], -outseq [output sequence file], -outfile [protein sequence file]). I have been using the subprocess module in place of os.system as I have read that it is more powerful and versatile.
The following is my python code, which runs without error but does not produce the desired output files.
from Bio import SeqIO
import re
import os
import subprocess
infile = input('Full path to EXISTING .fasta file would you like to open: ')
outdir = input('NEW Directory to write outfiles to: ')
os.mkdir(outdir)
for record in SeqIO.parse(infile, "fasta"):
print("Translating (6-Frame): " + record.id)
ident=re.sub("\|", "-", record.id)
print (infile)
print ("Old record ID: " + record.id)
print ("New record ID: " + ident)
subprocess.call (['C:\memboss\sixpack.exe', '-sequence ' + infile, '-outseq ' + outdir + ident + '.sixpack', '-outfile ' + outdir + ident + '.format'])
print ("Translation of: " + infile + "\nWritten to: " + outdir + ident)
Found the answer.. I was using the wrong syntax to call subprocess. This is the correct syntax:
subprocess.call (['C:\memboss\sixpack.exe', '-sequence', infile, '-outseq', outdir + ident + '.sixpack', '-outfile', outdir + ident + '.format'])

Invalid Python syntax using file.write

Trying to learn some geospatial python. More or less following the class notes here.
My Code
#!/usr/bin/python
# import modules
import ogr, sys, os
# set working dir
os.chdir('/home/jacques/misc/pythongis/data')
# create the text file we're writing to
file = open('data_export.txt', 'w')
# import the required driver for .shp
driver = ogr.GetDriverByName('ESRI Shapefile')
# open the datasource
data = driver.Open('road_surveys.shp', 1)
if data is None:
print 'Error, could not locate file'
sys.exit(1)
# grab the datalayer
layer = data.GetLayer()
# loop through the features
feature = layer.GetNextFeature()
while feature:
# acquire attributes
id = feature.GetFieldAsString('Site_Id')
date = feature.GetFieldAsString('Date')
# get coordinates
geometry = feature.GetGeometryRef()
x = str(geometry.GetX())
y = str(geometry.GetY()
# write to the file
file.Write(id + ' ' + x + ' ' + y + ' ' + cover + '\n')
# remove the current feature, and get a new one
feature.Destroy()
feature = layer.GetNextFeature()
# close the data source
datasource.Destroy()
file.close()
Running that gives me the following:
File "shape_summary.py", line 38
file.write(id + ' ' + x + ' ' + y + ' ' + cover + '\n')
^
SyntaxError: invalid syntax
Running Python 2.7.1
Any help would be fantastic!
Previous line is missing a close parenthesis:
y = str(geometry.GetY())
Also, just a style comment: it's a good idea to avoid using the variable name file in python because it actually has a meaning. Try opening a new python session and running help(file)
1)write should shouldn't be upper case in your code (Python is case sensitive)
2)make sure id is a string; if it's isn't use str(id) in your term, same for "cover" and "x" and "y"

Categories