How to call pypdfocr functions to use them in a python script? - python

Recently I downloaded pypdfocr, however, in the documentation there are no examples of how to call pypdfocr as a library, could anybody help me to call it just to convert a single file?. I just found a terminal command:
$ pypdfocr filename.pdf

If you're looking for the source code, it's normally under the directory site-package of your python installation. What's more, if you're using a IDE (i.e. Pycharm), it would help you find the directory and file. This is extremly useful to find class as well and show you how you can instantiate it, for example :
https://github.com/virantha/pypdfocr/blob/master/pypdfocr/pypdfocr.py
this file has a pypdfocr class type you can re-use and, possibly, do what a command-line would do.
In that class, the developper has put a lot of argument to be parsed :
def get_options(self, argv):
"""
Parse the command-line options and set the following object properties:
:param argv: usually just sys.argv[1:]
:returns: Nothing
:ivar debug: Enable logging debug statements
:ivar verbose: Enable verbose logging
:ivar enable_filing: Whether to enable post-OCR filing of PDFs
:ivar pdf_filename: Filename for single conversion mode
:ivar watch_dir: Directory to watch for files to convert
:ivar config: Dict of the config file
:ivar watch: Whether folder watching mode is turned on
:ivar enable_evernote: Enable filing to evernote
"""
p = argparse.ArgumentParser(description = "Convert scanned PDFs into their OCR equivalent. Depends on GhostScript and Tesseract-OCR being installed.",
epilog = "PyPDFOCR version %s (Copyright 2013 Virantha Ekanayake)" % __version__,
)
p.add_argument('-d', '--debug', action='store_true',
default=False, dest='debug', help='Turn on debugging')
p.add_argument('-v', '--verbose', action='store_true',
default=False, dest='verbose', help='Turn on verbose mode')
p.add_argument('-m', '--mail', action='store_true',
default=False, dest='mail', help='Send email after conversion')
p.add_argument('-l', '--lang',
default='eng', dest='lang', help='Language(default eng)')
p.add_argument('--preprocess', action='store_true',
default=False, dest='preprocess', help='Enable preprocessing. Not really useful now with improved Tesseract 3.04+')
p.add_argument('--skip-preprocess', action='store_true',
default=False, dest='skip_preprocess', help='DEPRECATED: always skips now.')
#---------
# Single or watch mode
#--------
single_or_watch_group = p.add_mutually_exclusive_group(required=True)
# Positional argument for single file conversion
single_or_watch_group.add_argument("pdf_filename", nargs="?", help="Scanned pdf file to OCR")
# Watch directory for watch mode
single_or_watch_group.add_argument('-w', '--watch',
dest='watch_dir', help='Watch given directory and run ocr automatically until terminated')
#-----------
# Filing options
#----------
filing_group = p.add_argument_group(title="Filing optinos")
filing_group.add_argument('-f', '--file', action='store_true',
default=False, dest='enable_filing', help='Enable filing of converted PDFs')
#filing_group.add_argument('-c', '--config', type = argparse.FileType('r'),
filing_group.add_argument('-c', '--config', type = lambda x: open_file_with_timeout(p,x),
dest='configfile', help='Configuration file for defaults and PDF filing')
filing_group.add_argument('-e', '--evernote', action='store_true',
default=False, dest='enable_evernote', help='Enable filing to Evernote')
filing_group.add_argument('-n', action='store_true',
default=False, dest='match_using_filename', help='Use filename to match if contents did not match anything, before filing to default folder')
# Add flow option to single mode extract_images,preprocess,ocr,write
args = p.parse_args(argv)
You can use any of those argument to be passed to it's parser, like this :
import pypdfocr
obj = pypdfocr.pypdfocr.pypdfocr()
obj.get_options([]) # this makes it takes default, but you could add CLI option to it. Other option might be [-v] or [-d,-v]
I hope this help you understand in the mean time :)

Related

Running Argpars but got this error SystemExit 2

I am trying to set up training arguments and parse but I got this error could anyone help please!
parser = argparse.ArgumentParser(description='Explore pre-trained AlexNet')
parser.add_argument(
'--image_path', type=str,
help='Full path to the input image to load.')
parser.add_argument(
'--use_pre_trained', type=bool, default=True,
help='Load pre-trained weights?')
args = parser.parse_args()
got this error
usage: ipykernel_launcher.py [-h] [--image_path IMAGE_PATH]
[--use_pre_trained USE_PRE_TRAINED]
ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-ff8e2476-e39b-4e40-b8f9-6b8113fe8f1f.json
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
In a Jupyter notebook cell:
import sys
sys.argv
I get get
['/usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py',
'-f',
'/home/paul/.local/share/jupyter/runtime/kernel-7923bfd2-9f96-45cf-8b44-1859a2185715.json']
The Jupyter server is using the sys.argv to set up the communication channel with your kernel. argparse parses this list too.
So commandline and argparse cannot be used to provide arguments to your notebook when run this way.
How did you start this script? Did you even try to provide the commandline values that the script expected?
'--image_path'
'--use_pre_trained'
If you did, you probably would have gotten a different parser's error, about 'unexpected arguments'. That's coming from the server.
If you use a Colab may this solution help you, this solution suggest to you to write argparse in the other python file introduction-argparse-colab
%%writefile parsing.py
import argparse
parser = argparse.ArgumentParser()
parser.parse_args()
In a Jupyter notebook code works like this:
parser = argparse.ArgumentParser()
parser.add_argument('--image_folder', type=str, default='/content/image', help='path to image folder')
parser.add_argument("-f", "--file", required=False)
You need to add in the end of " parser.add_argument " line ( This is the reason you are getting 'SystemExit: 2' error ):
parser.add_argument("-f", required=False)
in your case this should work:
parser = argparse.ArgumentParser(description='Explore pre-trained AlexNet')
parser.add_argument('--image_path', type=str, default='your_path', help='Full path to the input image.')
parser.add_argument('--use_pre_trained', type=bool, default=True, help='Load pre-trained weights?')
parser.add_argument("-f", "--file", required=False)
args = parser.parse_args()
Now you can call:
image = args.image_path
Or
from PIL import Image
image = Image.open(args.image_path)
Tested in Google colab

Confusion in using argparse in Python

I would like to execute a code which is accessible from this link.
The code objective is to read and extract the pdf annotation.
However, I am not sure how to direct the pdf file path using the argparse, which I suspect, should be the following argparse.
p.add_argument("input", metavar="INFILE", type=argparse.FileType("rb"),
help="PDF files to process", nargs='+')
Say, I know the absolute path of the pdf file is as follow;
C:\abc.pdf
Also, given that I still try to comprehend the code, so I would like to avoid re-enter the path C:\abc.pdf over and over again. Is there are way, I can temporary hard coded it within the def parse_args()
I have read several thread about this topic, however, still having difficulties in comprehend about this issue.
Thanks in advance for any insight.
You add the argument to a parser:
import argparse
p = argparse.ArgumentParser()
p.add_argument("input", metavar="INFILE", type=argparse.FileType("rb"),
help="PDF files to process", nargs='+')
args = p.parse_args()
Then args.input will be a tuple of one or more open file handles.
If you're running pdfannots.py from the command prompt, then you do so as, e.g.,
python pdfannots.py C:\abc.pdf
If you want to run it over multiple PDF files, then you do so as, e.g.,
python pdfannots.py C:\abc.pdf D:\xyz.pdf E:\foo.pdf
If you really want to hard code the path, you'll have to edit pdfannots.py as follows:
def parse_args():
p = argparse.ArgumentParser(description=__doc__)
# p.add_argument("input", metavar="INFILE", type=argparse.FileType("rb"),
# help="PDF files to process", nargs='+')
g = p.add_argument_group('Basic options')
g.add_argument("-p", "--progress", default=False, action="store_true",
help="emit progress information")
g.add_argument("-o", metavar="OUTFILE", type=argparse.FileType("w"), dest="output",
default=sys.stdout, help="output file (default is stdout)")
g.add_argument("-n", "--cols", default=2, type=int, metavar="COLS", dest="cols",
help="number of columns per page in the document (default: 2)")
g = p.add_argument_group('Options controlling output format')
allsects = ["highlights", "comments", "nits"]
g.add_argument("-s", "--sections", metavar="SEC", nargs="*",
choices=allsects, default=allsects,
help=("sections to emit (default: %s)" % ', '.join(allsects)))
g.add_argument("--no-group", dest="group", default=True, action="store_false",
help="emit annotations in order, don't group into sections")
g.add_argument("--print-filename", dest="printfilename", default=False, action="store_true",
help="print the filename when it has annotations")
g.add_argument("-w", "--wrap", metavar="COLS", type=int,
help="wrap text at this many output columns")
return p.parse_args()
def main():
args = parse_args()
global COLUMNS_PER_PAGE
COLUMNS_PER_PAGE = args.cols
for file in [open(r"C:\Users\jezequiel\Desktop\Timeline.pdf", "rb")]:
(annots, outlines) = process_file(file, args.progress)
pp = PrettyPrinter(outlines, args.wrap)
if args.printfilename and annots:
print("# File: '%s'\n" % file.name)
if args.group:
pp.printall_grouped(args.sections, annots, args.output)
else:
pp.printall(annots, args.output)
return 0

Displaying help through command line argument

I have a python program that I am running through command line arguments. I have used sys module.
Below is my test.py Python file where I am taking all the args:
if len(sys.argv) > 1:
files = sys.argv
get_input(files)
The get_input method is in another Python file where I have the options defined.
options = {
'--case1': case1,
'--case2': case2,
}
def get_input(arguments):
for file in arguments[1:]:
if file in options:
options[file]()
else:
invalid_input(file)
To run:
python test.py --case1 --case2
My intentions are that I want to show the user all the commands in case they want to read the docs for that.
They should be able to read all the commands like they usually are in all the package for reading help, python test.py --help . With this they should be able to look into all the commands they can run.
How do I do this?
One of the best quality a Python developer can be proud of is to use built-in libraries instead of custom ones. So let's use argparse:
import argparse
# define your command line arguments
parser = argparse.ArgumentParser(description='My application description')
parser.add_argument('--case1', help='It does something', action='store_true')
parser.add_argument('--case2', help='It does something else, I guess', action='store_true')
# parse command line arguments
args = parser.parse_args()
# Accessing arguments values
print('case1 ', args.case1)
print('case2 ', args.case2)
You can now use your cmd arguments like python myscript.py --case1
This comes with a default --help argument you can now use like: python myscript.py --help which will output:
usage: myscript.py [-h] [--case1] [--case2]
My application description
optional arguments:
-h, --help show this help message and exit
--case1 It does something
--case2 It does something else, I guess
Hi you can use option parser and add your options and related help information.
It has by default help option which shows all the available options which you have added.
The detailed document is here. And below is the example.
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()

Python code to redirect stdin and stdout to a shell script

I am trying to write a program that reads from a file and writes to a new file using the command line. I am using the CommandLine class and using ArgParse to accept different args as well as the file names for input and output. I am trying to redirect stdin and stdout to these files. However, I keep getting an error that I put at the bottom of the code below. Am I inputting the arguments to the command line incorrectly, or is something else going on? All of my files are in the same folder.
class CommandLine() :
'''
Handle the command line, usage and help requests.
CommandLine uses argparse, now standard in 2.7 and beyond.
it implements a standard command line argument parser with various argument options,
a standard usage and help, and an error termination mechanism do-usage_and_die.
attributes:
all arguments received from the commandline using .add_argument will be
avalable within the .args attribute of object instantiated from CommandLine.
For example, if myCommandLine is an object of the class, and requiredbool was
set as an option using add_argument, then myCommandLine.args.requiredbool will
name that option.
'''
def __init__(self, inOpts=None):
'''
CommandLine constructor.
Implements a parser to interpret the command line argv string using argparse.
'''
import argparse
self.parser = argparse.ArgumentParser(description = 'Program prolog - a brief description of what this thing does',
epilog = 'Program epilog - some other stuff you feel compelled to say',
add_help = True, #default is True
prefix_chars = '-',
usage = '%(prog)s [options] -option1[default] <input >output'
)
self.parser.add_argument('inFile', action = 'store', help='input file name')
self.parser.add_argument('outFile', action = 'store', help='output file name')
self.parser.add_argument('-lG', '--longestGene', action = 'store', nargs='?', const=True, default=False, help='longest Gene in an ORF')
self.parser.add_argument('-mG', '--minGene', type=int, choices= (0, 100, 200, 300, 500, 1000), action = 'store', help='minimum Gene length')
self.parser.add_argument('-s', '--start', action = 'append', nargs='?', help='start Codons') #allows multiple list options
self.parser.add_argument('-st', '--stop', action = 'append', nargs='?', help='stop Codons') #allows multiple list options
self.parser.add_argument('-v', '--version', action='version', version='%(prog)s 0.1')
if inOpts is None :
self.args = self.parser.parse_args()
else :
self.args = self.parser.parse_args(inOpts)
C:\Users\Zach\Documents\UCSC\BME 160\Lab 5>python findORFsCmdLine.py -- minGene=3
00 --longestGene --start=ATG --stop=TAG --stop=TGA --stop=TAA <tass2.fa >result.
txt
usage: findORFsCmdLine.py [options] -option1[default] <input >output
findORFsCmdLine.py: error: the following arguments are required: inFile, outFile
The error tells you that you did not provide inFile and outFile. This seems confusing because you did indeed provide <tass2.fa and >result.txt on the command line.
On the command line < and > have special meaning. <tass2.fa is handled by bash (or whichever shell you use) and it opens the file tass2.fa and sends the contents to your program over stdin. It then removes that from the list of command line arguments because bash has already taken care of it.
A similar thing happens with >result.txt where any output from your file that normally goes to stdout such as print calls will be written to that file. Again bash handles this so your program never gets the argument from the command line.
To make sure the filenames go to your program you need to remove the < and > symbols.
In addition to that, the way you are call add_argument for inFile and outFile is not correct. Using action='store' simply stores the value of the argument under the given name. This is the default action so doing that is redundant.
You need to tell add_argument that these are file types. Then you can use these as files and read and write as you desire.
parser.add_argument('inFile', type=argparse.FileType('r'))
parser.add_argument('outFile', type=argparse.FileType('w'))
If you wish to allow these arguments to be optional and use stdin and stdout automatically if they do not supply a filename then you can do this:
parser.add_argument('inFile', type=argparse.FileType('r'),
nargs='?', default=sys.stdin)
parser.add_argument('outFile', type=argparse.FileType('w'),
nargs='?', default=sys.stdout)
You say you are writing a program that reads from a file and writes to a new file using the command line and go on to say that you are trying to redirect these files to stdin and stdout.
If it is really your intention to redirect the input and output files using the shell with the > and < operators, then you do not need to call add_argument with outFile at all. Just let the shell handle it. Remove that line from your program and use print or similar to send your content to stdout and that file will be created. You will still need the call add_argument('inFile', ....

Python/argparse: How to make an argument (i.e. --clear) don't warn me "error: too few arguments"?

I have this cmd line parsing stuf and I need to make "--clear" a valid unique parameter. However I'm getting an "Error: Too few arguments" when I only use "--clear" as an unique parameter.
parser = argparse.ArgumentParser(
prog="sl",
formatter_class=argparse.RawDescriptionHelpFormatter,
description="Shoot/Project launcher application",
epilog="")
parser.add_argument("project", metavar="projectname",
help="Name of the project/shot to use")
parser.add_argument("-p", metavar="project_name",
help="Name of the project")
parser.add_argument("-s", metavar="shot_name",
help="Name of the shot")
parser.add_argument("--clear",action='store_true',
help="Clear the information about the current selected project")
parser.add_argument("--test",
help="test parameter")
args=parser.parse_args()
Any ideas? Thanks
Update:
Trying to answer some questions of the comments.
When I launch the app like:
sl project
it works fine.
But if I launch it like:
sl --clear
I got a simple "sl: error: too few arguments"
The --clear argument is not the problem here; project is a required argument.
If you should be able to call your program without naming a project, make project optional by adding nargs='?':
parser.add_argument("project", metavar="projectname",
help="Name of the project/shot to use", nargs='?')
If it is an error to not specify a project name when other command-line switches are used, do so explicitly after parsing:
args = parser.parse_args()
if not args.clear and args.project is None:
parser.error('Please provide a project')
Calling parser.error() prints the error message, the help text and exits with return code 2:
$ python main.py --clear
Namespace(clear=True, p=None, project=None, s=None, test=None)
$ python main.py
usage: sl [-h] [-p project_name] [-s shot_name] [--clear] [--test TEST]
[projectname]
sl: error: Please provide a project
Add default values for each argument (--clear is no exception)

Categories