I need to enter the contents of a text (.txt) file as input for a Python (.py) file. Assuming the name of the text file is TextFile and the name of the Python file PythonFile, then the code should be as follows:
python PythonFile.py < TextFile.txt
Yet, when I try to do this in IDLE and type in
import PythonFile < TextFile,
IDLE gives me an invalid syntax message, pointing to the < sign. I tried all sorts of variations on this theme (i.e.,using or not using the file name extensions), but still got the same invalid-syntax message. How is the syntax different for input redirection in IDLE?
If it works in the command line, then why do you want to do this in IDLE? There are ways to achieve a similar result using, for example, subprocess, but a better way would be to refactor PythonFile.py so that you can call a function from it, e.g.:
>>> import PythonFile
>>> PythonFile.run_with_input('TextFile.txt')
If you post the contents of PythonFile.py, we might be able to help you do this.
Related
I need to extract text from a PDF. I tried the PyPDF2, but the textExtract method returned an encrypted text, even though the pdf is not encrypted acoording to the isEncrypted method.
So I moved on to trying accessing a program that does the job from the command prompt, so I could call it from python with the subprocess module. I found this program called textExtract, which did the job I wanted with the following command line on cmd:
"textextract.exe" "download.pdf" /to "download.txt"
However, when I tried running it with subprocess I couldn't get a 0 return code.
Here is the code I tried:
textextract = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
subprocess.run(textextract)
I already tried it with shell=True, but it didn't work.
Can anyone help me?
I was able to get the following script to work from the command line after installing the PDF2Text Pilot application you're trying to use:
import shlex
import subprocess
args = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
print('args:', args)
subprocess.run(args)
Sample screen output of running it from a command line session:
> C:\Python3\python run-textextract.py
args: ['textextract.exe', 'download.pdf', '/to', 'download.txt']
Progress:
Text from "download.pdf" has been successfully extracted...
Text extraction has been completed!
The above output was generated using Python 3.7.0.
I don't know if your use of spyder on anaconda affects things or not since I'm not familiar with it/them. If you continue to have problems with this, then, if it's possible, I suggest you see if you can get things working directly—i.e. running the the Python interpreter on the script manually from the command line similar to what's shown above. If that works, but using spyder doesn't, then you'll at least know the cause of the problem.
There's no need to build a string of quoted strings and then parse that back out to a list of strings. Just create a list and pass that:
command=["textextract.exe", "download.pdf", "/to", "download.txt"]
subprocess.run(command)
All that shlex.split is doing is creating a list by removing all of the quotes you had to add when creating the string in the first place. That's an extra step that provides no value over just creating the list yourself.
So I'm using approach in this post
to extract a double quoted string from a string. If the input string comes from terminal argument, it works fine. But if the input string comes from a txt file like the following, it gives nontype error. I tried to get the hash code for two strings(one from file and one from terminal) with identical txt content, and turns out they are different. I'm curious if anyone knows how to solve this?(in Python 3.x)
That said, I have set the default encoding to "utf-8" in my code.
python filename.py < input.txt
If you are using command python, the command recognize it to python 2.x.
If you want python 3.x, just change the command to python3
like this
python3 filename.py < input.txt
Two things, if you want to ingest a txt file into a python script, you need to specify it. Add these two lines
import sys
text = str(sys.argv[1])
this mean text would be your 'input.txt'.
Second, if your script has only a function, it would not know what you want to do with the function, you have to either, tell the script explicity to execute the function through the entry main
import re
import sys
def doit(text):
matches=re.findall(r'\"(.+?)\"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
if __name__ == '__main__':
text_file = str(sys.argv[1])
text = open(text_file).read()
print(doit(text))
Alternately, you can just execute line by line without wrapping the re in a function, since it is only one line.
I just figure it out, the bug doesn't come from my code. I had the "smart quotes" enabled on my Mac so whenever it reads a quote, it's identified as a special character. Disable this under keyboard setting would do the trick.
LOL what a "bug".
i'm currently solving optimization problems with python. One problem takes about 5hrs to solve which result will appear (sadly) in console but not completely since it's a very big list. I'm wondering right now if there is anyway to get the solution of the list completely while i'm waiting for the results (i've been waiting for 4hrs. now).
If there is a way i can get the complete result in the console after the code has runned, let me know.
If there is a way via console to put the results in a .txt file or any type of readable file afterwards please let me know.
If there is no solution to this problem, let me know.
Thanks alot.
say your script is called yourscript.py
and you have been running it in the console with: python yourscript.py
if you use this command:
python yourscript.py > newfile.txt
all of the output will go into newfile.txt
Suppose your python file is myfile.py, in order to redirect what is printed in this file run your file as:
python myfile.py > myfile.txt
Your printed list will be printed in myfile.txt.
pipe it into a file with the format you want
for example your file name is xyz.py
In your python shell
python xyz.py > myfile.txt
Your output will now be in myfile.txt
I've hit a issue that I don't really understand how to overcome. I'm trying to create a subprocess in python to run another python script. Not too difficult. The issue is I'm unable to get around is EOF error when a python file includes a super long string.
Here's an example of what my files look like.
Subprocess.py:
### call longstr.py from the primary pyfile
subprocess.call(['python longstr.py'], shell = True)
Longstr.py
### called from subprocess.py
### the actual string is a lot longer; this is an example to illustrate how the string is formatted
lngstr = """here is a really long
string (which has many n3w line$ and "characters")
that are causing the python file to state the file is ending early
"""
print lngstr
Printer error in terminal
SyntaxError: EOF while scanning triple-quoted string literal
As a work around, I tried to remove all linebreaks as well as all spaces to see if it was due to it being multi-line. That still returned the same result.
My assumption is that when the subprocess is running and the shell is doing something with the file contents, when the new line is reached the shell itself is freaking out and that's what's terminating the process; not the file.
What is the correct workaround for having subprocess run a file like this?
Thank you for your help.
Answering my own question here; my problem was that I didn't file.close() before trying to execute a subprocess.call.
If you encounter this problem, and are working with recently written files this could be your issue too. Thank you to everyone who read or responded to this thread.
I am trying to read a dicom header tag in dicom file.
Now, there are two ways to read this dicom header tag.
1) Using pydicom package in python which apparently is not working well on my python installed version(python 3).
2) or when i call AFNI function 'dicom_hinfo' through command line, i can get dicom tag value. The syntax to call afni function in terminal is as follows:
dicom_hinfo -tag aaaa,bbbb filename.dcm
output:fgre
Now how should i call this dicom-info -tag aaaa,bbbb filename.dcm in python script.
I guess subprocess might work but not sure about how to use it in this case.
To get output from a subprocess, you could use check_output() function:
#!/usr/bin/env python
from subprocess import check_output
tag = check_output('dicom_hinfo -tag aaaa,bbbb filename.dcm output:fgre'.split(),
universal_newlines=True).strip()
universal_newlines=True is used to get Unicode text on Python 3 (the data is decoded using user locale's character encoding).
check_output() assumes that dicom_hinfo prints to its standard output stream (stdout). Some utilities may print to stderr or the terminal directly instead. The code could be modified to adapt to that.
Oh this was due to syntax error using Pydicom.
I wanted to access 0019, 109c tag.
Syntax should be:
ds[0x0019,0x109c].value.
not ds[aaaa,bbbb].value