I write a lot of little scripts that process files on a line-by-line basis. In Perl, I use
while (<>) {
do stuff;
}
This is handy because it doesn't care where the input comes from (a file or stdin).
In Python I use this
if len(sys.argv) == 2: # there's a command line argument
sys.stdin = file(sys.argv[1])
for line in sys.stdin.readlines():
do stuff
which doesn't seem very elegant. Is there a Python idiom that easily handles file/stdin input?
The fileinput module in the standard library is just what you want:
import fileinput
for line in fileinput.input(): ...
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:], defaulting to sys.stdin if the list is empty.
fileinput defaults to stdin, so would make it slightly more concise.
If you do a lot of command-line stuff, though, this piping hack is very neat.
Related
I am attempting to cat a CSV file into stdout and then pipe the printed output as input into a python program that also takes a system argument vector with 1 argument. I ran into an issue I think directly relates to how Python's fileinput.input() function reacts with regards to occupying the stdin file descriptor.
generic_user% cat my_data.csv | python3 my_script.py myarg1
Here is a sample Python program:
import sys, fileinput
def main(argv):
print("The program doesn't even print this")
data_list = []
for line in fileinput.input():
data_list.append(line)
if __name__ == "__main__":
main(sys.argv)
If I attempt to run this sample program with the above terminal command and no argument myarg1, the program is able to evaluate and parse the stdin for the data output from the CSV file.
If I run the program with the argument myarg1, it will end up throwing a FileNotFoundError directly related to myarg1 not existing as a file.
FileNotFoundError: [Errno 2] No such file or directory: 'myarg1'
Would someone be able to explain in detail why this behavior takes place in Python and how to handle the logic such that a Python program can first handle stdin data before argv overwrites the stdin descriptor?
You can read from the stdin directly:
import sys
def main(argv):
print("The program doesn't even print this")
data_list = []
for line in iter(sys.stdin):
data_list.append(line)
if __name__ == "__main__":
main(sys.argv)
You are trying to access a file which has not been yet created, hence fileinput cannot open it, but since you are piping the data you have no need for it.
This is by design. The conceptors of fileinput thought that there were use cases where reading from stdin would be non sense and just provided a way to specifically add stdin to the list of files. According to the reference documentation:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:], defaulting to sys.stdin if the list is empty. If a filename is '-', it is also replaced by sys.stdin.
Just keep your code and use: generic_user% cat my_data.csv | python3 my_script.py - myarg1
to read stdin before myarg1 file or if you want to read it after : ... python3 my_script.py myarg1 -
fileinput implements a pattern common for Unix utilities:
If the utility is called with commandline arguments, they are files to read from.
If it is called with no arguments, read from standard input.
So fileinput works exactly as intended. It is not clear what you are using commandline arguments for, but if you don't want to stop using fileinput, you should modify sys.argv before you invoke it.
some_keyword = sys.argv[1]
sys.argv = sys.argv[:1] # Retain only argument 0, the command name
for line in fileinput.input():
...
Using Python in NetBeans and having some trouble to set up file arguments as input/output. For instance:
import re, sys
for line in sys.stdin:
for token in re.split("\s+", line.strip()):
print(token)
Command line usage python splitprog.py < input.txt > output.txt works great. But in NetBeans the output window just waits, with nothing happening even if one give a file name (tested many combinations).
The Application Arguments row in project properties (where one would enter these files for a Java project) doesn’t seem to be used either, as the behaviour is the same regardless of whether there are file names/paths there or not. Is there some trick to get this to work or are file args currently unusable when it comes to Python in NetBeans?
ADD: As per suggestion by #John Zwinck, an example solution:
import re, sys
with open(sys.argv[1]) as infile:
with open(sys.argv[2], "w") as outfile:
for line in infile:
for token in re.split("\s+", line.strip()):
print(token, file = outfile)
Argument files are set in NB project properties. In command prompt, the programme is now simply run by python splitprog.py input.txt output.txt.
When you do this:
python splitprog.py < input.txt > output.txt
You are redirecting input.txt to stdin of python, and stdout of python to output.txt. You aren't using command line arguments to splitprog.py at all.
NetBeans does not support this.
Instead, you should pass the filenames as arguments, like this:
python splitprog.py input.txt output.txt
Then in NetBeans you just set the command line arguments to input.txt output.txt and it will work the same as the above command line in the shell. You'll need to modify your program slightly, perhaps like this:
with open(sys.argv[1]) as infile:
for line in infile:
# ...
If you still want to support stdin and stdout, one convention is to use - to mean those standard streams, so you could code your program to support this:
python splitprog.py - - < input.txt > output.txt
That is, you can write your program to understand - as "use the standard stream from the shell", if you need to support the old way of doing things. Or just default to this behavior if no command line arguments are given.
I am trying to test a simple code that reads a file line-by-line with Pycharm.
for line in sys.stdin:
name, _ = line.strip().split("\t")
print name
I have the file I want to input in the same directory: lib.txt
How can I debug my code in Pycharm with the input file?
You can work around this issue if you use the fileinput module rather than trying to read stdin directly.
With fileinput, if the script receives a filename(s) in the arguments, it will read from the arguments in order. In your case, replace your code above with:
import fileinput
for line in fileinput.input():
name, _ = line.strip().split("\t")
print name
The great thing about fileinput is that it defaults to stdin if no arguments are supplied (or if the argument '-' is supplied).
Now you can create a run configuration and supply the filename of the file you want to use as stdin as the sole argument to your script.
Read more about fileinput here
I have been trying to find a way to use reading file as stdin in PyCharm.
However, most of guys including jet brains said that there is no way and no support, it is the feature of command line which is not related PyCharm itself.
* https://intellij-support.jetbrains.com/hc/en-us/community/posts/206588305-How-to-redirect-standard-input-output-inside-PyCharm-
Actually, this feature, reading file as stdin, is somehow essential for me to ease giving inputs to solve a programming problem from hackerank or acmicpc.
I found a simple way. I can use input() to give stdin from file as well!
import sys
sys.stdin = open('input.in', 'r')
sys.stdout = open('output.out', 'w')
print(input())
print(input())
input.in example:
hello world
This is not the world ever I have known
output.out example:
hello world
This is not the world ever I have known
You need to create a custom run configuration and then add your file as an argument in the "Script Parameters" box. See Pycharm's online help for a step-by-step guide.
However, even if you do that (as you have discovered), your problem won't work since you aren't parsing the correct command line arguments.
You need to instead use argparse:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("filename", help="The filename to be processed")
args = parser.parse_args()
if args.filename:
with open(filename) as f:
for line in f:
name, _ = line.strip().split('\t')
print(name)
For flexibility, you can write your python script to always read from stdin and then use command redirection to read from a file:
$ python myscript.py < file.txt
However, as far as I can tell, you cannot use redirection from PyCharm as Run Configuration does not allow it.
Alternatively, you can accept the file name as a command-line argument:
$ python myscript.py file.txt
There are several ways to deal with this. I think argparse is overkill for this situation. Alternatively, you can access command-line arguments directly with sys.argv:
import sys
filename = sys.argv[1]
with open(filename) as f:
for line in f:
name, _ = line.strip().split('\t')
print(name)
For robust code, you can check that the correct number of arguments are given.
Here's my hack for google code jam today, wish me luck. Idea is to comment out monkey() before submitting:
def monkey():
print('Warning, monkey patching')
global input
input = iter(open('in.txt')).next
monkey()
T = int(input())
for caseNum in range(1, T + 1):
N, L = list(map(int, input().split()))
nums = list(map(int, input().split()))
edit for python3:
def monkey():
print('Warning, monkey patching')
global input
it = iter(open('in.txt'))
input = lambda : next(it)
monkey()
abc=123
dabc=123
abc=456
dabc=789
aabd=123
From the above file I need to find lines beginning with abc= (whitespaces doesn't matter)
in ruby I would put this in an array and do
matches = input.grep(/^\s*abc=.*/).map(&:strip)
I'm a totally noob in Python, even said I'm a fresh Python developer is too much.
Maybe there is a better "Python way" of doing this without even grepping ?
The Python version I have available on the platform where I need to solve the problem is 2.6
There is no way of use Ruby at that time
with open("myfile.txt") as myfile:
matches = [line.rstrip() for line in myfile if line.lstrip().startswith("abc=")]
In Python you would typically use a list comprehension whose if clause does what you'd accomplish with Ruby's grep:
import sys, re
matches = [line.strip() for line in sys.stdin
if re.match(r'^\s*abc=.*', line)]
I have a lot of Perl scripts that looks something like the following. What it does is that it will automatically open any file given as a command line argument and in this case print the content of that file. If no file is given it will instead read from standard input.
while ( <> ) {
print $_;
}
Is there a way to do something similar in Python without having to explicitly open each file?
The fileinput module in Python's standard library is designed exactly for this purpose, and I quote a bit of code from the URL I just gave:
import fileinput
for line in fileinput.input():
process(line)
Use print in lieu of process and you have the exact equivalent of your Perl code.
You could look into sys.argv. It may help.