stdin stdout python: how to reuse the same input file twice? - python

I am quite new to Python and even newer to stdin stdout method. Nevertheless I need to make my script usable for UNIX commands, in order to make it possible for example to process 2 input files at once with my script.
This script works perfectly well with command line arguments:
newlist = []
def f1()
....
def f2(input_file):
vol_id = sys.argv[3]
for line in input_file:
if ... :
line = line.replace('abc','def')
line = line.replace('id', 'id'+vol_id)
....
newlist.append(line)
return newlist
def main():
if len(sys.argv) < 4:
print 'usage: ./myscript.py [file_in... file_out... volume_id]'
sys.exit(1)
else:
filename = sys.argv[1]
filename_out = sys.argv[2]
tree = etree.parse(filename)
extract(tree)
input_file = open(filename, 'rU')
change_class(input_file)
file_new = open(filename_out, 'w')
for x in newlist:
if '\n' in x:
x = x.replace('\n', '')
print>>file_new, x
When I tried to add stdin stdout to it, I first had a problem with reading the same input file first, and for this reason made some chages so that it would be actually open only once. Here is my modified main():
filename = sys.argv[1]
filename_out = sys.argv[2]
if filename == '-':
filename = sys.stdin
else:
input_file = open(filename, 'rU')
if filename_out == '-':
filename_out = sys.stdout
file_new = filename_out
else:
file_new = open(filename_out, 'w')
input_file = open(filename, 'rU')
tree = etree.fromstring(input_file)
extract(tree)
change_class(input_file)
for x in newlist:
if '\n' in x:
x = x.replace('\n', '')
print>>file_new, x
Then I ran my script like this:
./myscript.py - - volumeid < inputfile > outputfile
And I got this error message:
Traceback (most recent call last):
File "./myscript.py", line 191, in <module>
main()
File "./myscript.py", line 175, in main
input_file = open(filename, 'rU')
TypeError: coercing to Unicode: need string or buffer, file found
What am I doing wrong?

You are trying to use an open file object as a filename:
filename = sys.stdin
# ...
input_file = open(filename, 'rU')
You cannot re-read from sys.stdin anyway; you need to read all of the file into memory, then process it twice:
if filename == '-':
input_file = sys.stdin
else:
input_file = open(filename, 'rU')
input_data = input_file.read()
tree = etree.fromstring(input_data)
extract(tree)
change_class(input_data)
mwhere you'll have to alter change_class to handle a string, not an open file object.

Related

does not print into the original file

in the last 2 lines the file1 stays blanks even with the write function. the rest of the code works flawlessly
def modQuantity(filepath: str,):
model = input("Model: ")
size = input("size")
newquantity = input("New Quantity: ")
file = open(filepath, 'r')
tempfile = open(filepath+"temp", 'w')
for line in file:
sep = line.split()
if sep[0] == model and sep[1] == size:
tempfile.write(f"{sep[0]} {sep[1]} {newquantity}\n")
else:
tempfile.write(f"{line}")
tempfile.close()
file.close()
tempfile1 = open(filepath+"temp", 'r')
file1 = open(filepath, 'w')
for line1 in tempfile1:
file1.write(f"{line1}")
You didn't close the file so it hadn't chance to flush the content. You can use that method by yourself or close the file.
I recommend to use contextmanager so you can be sure file is flushed and closed:
with open(filepath, 'w') as file1:
for line1 in tempfile1:
file1.write(f"{line1}")

How to implement STDOUT and file write based on parameter input

I have the input file that looks like this (infile.txt):
a x
b y
c z
I want to implement a program that enable user to write to STDOUT or file depending on the command:
python mycode.py infile.txt outfile.txt
Will write to file.
And with this
python mycode.py infile.txt #2nd case
Will write to STDOUT.
I'm stuck with this code:
import sys
import csv
nof_args = len(sys.argv)
infile = sys.argv[1]
print nof_args
outfile = ''
if nof_args == 3:
outfile = sys.argv[2]
# for some reason infile is so large
# so we can't save it to data structure (e.g. list) for further processing
with open(infile, 'rU') as tsvfile:
tabreader = csv.reader(tsvfile, delimiter=' ')
with open(outfile, 'w') as file:
for line in tabreader:
outline = "__".join(line)
# and more processing
if nof_args == 3:
file.write(outline + "\n")
else:
print outline
file.close()
When using 2nd case it produces
Traceback (most recent call last):
File "test.py", line 18, in <module>
with open(outfile, 'w') as file:
IOError: [Errno 2] No such file or directory: ''
What's the better way to implement it?
You can try this:
import sys
if write_to_file:
out = open(file_name, 'w')
else:
out = sys.stdout
# or a one-liner:
# out = open(file_name, 'w') if write_to_file else sys.stdout
for stuff in data:
out.write(stuff)
out.flush() # cannot close stdout
# Python deals with open files automatically
You can also use this instead of out.flush():
try:
out.close()
except AttributeError:
pass
This looks a bit ugly to me, so, flush will be just well.

Removing all BOMs from file with multiple BOMs

I have a text file containing multiple lines beginning with a byte order mark. Passing encoding='utf-8-sig' to open removes the BOM at the start of the file but all subsequent BOMs remain. Is there a more correct way to remove these than this:
import codecs
filepath = 'foo.txt'
bom_len = len(codecs.BOM_UTF8)
def remove_bom(s):
s = str.encode(s)
if codecs.BOM_UTF8 in s:
s = s[bom_len:]
return s.decode()
try:
with open(filepath, encoding='utf-8-sig') as file_object:
for line in file_object:
line = line.rstrip()
line = remove_bom(line)
if line != '':
print([line[0]])
except FileNotFoundError:
print('No file found at ' + filepath)
I'm having similar problems.
This kinda helped me:
import codecs
with open(path, "rb") as infile:
bytecontent = infile.read()
bytecontent = bytecontent.replace(codecs.BOM_UTF8, b"")

Encoding utf-8 error, when stdin is used instead of command line arguments

I need to make my script usable for UNIX commands, in order to make it possible for example to process 2 input files at once with my script. This script works perfectly well with command line arguments:
newlist = []
def f1()
....
return places
return persons
return unknown
def f2(input_file):
volume_id = sys.argv[3]
for line in input_data:
if any(place+'</dfn>' in line.decode('utf-8') for place in places):
line = line.replace('"person"', '"place"')
line = line.replace('id="', 'id="'+volume_id)
elif any(unk+'</dfn>' in line.decode('utf-8') for unk in unknown):
line = line.replace('"person"', '"undefined"')
line = line.replace('id="', 'id="'+volume_id)
elif 'class="person"' in line.decode('utf-8') and '<dfn' not in line:
line = line.replace('class="person"', '')
line = line.replace('id="', 'id="'+volume_id)
elif 'id="' in line:
line = line.replace('id="', 'id="'+volume_id)
newlist.append(line)
return newlist
def main():
if len(sys.argv) < 4:
print 'usage: ./myscript.py [file_in... file_out... volume_id]'
sys.exit(1)
else:
filename = sys.argv[1]
filename_out = sys.argv[2]
tree = etree.parse(filename)
extract(tree)
input_file = open(filename, 'rU')
change_class(input_file)
file_new = open(filename_out, 'w')
for x in newlist:
if '\n' in x:
x = x.replace('\n', '')
print>>file_new, x
When I tried to add stdin stdout to it, I first had a problem with reading the same input file first, and for this reason made some chages so that it would be actually open only once. I modified the following:
def f2(input_data) #instead of input_file
and I modified main():
filename = sys.argv[1]
filename_out = sys.argv[2]
if filename == '-':
input_file = sys.stdin
else:
input_file = open(filename, 'rU')
if filename_out == '-':
filename_out = sys.stdout
file_new = filename_out
else:
file_new = open(filename_out, 'w')
input_data = input_file.read()
tree = etree.fromstring(input_data)
extract(tree)
change_class(input_data)
for x in newlist:
if '\n' in x:
x = x.replace('\n', '')
print>>file_new, x
I run the program from the command line:
./myscript.py - - volumeid < inputfile > outputfile
And now I get an encoding problem:
Traceback (most recent call last):
File "./exportXMLstd.py", line 192, in <module>
main()
File "./exportXMLstd.py", line 182, in main
change_class(input_data)
File "./exportXMLstd.py", line 135, in change_class
if any(place+'</dfn>' in line.decode('utf-8') for place in places):
File "./exportXMLstd.py", line 135, in <genexpr>
if any(place+'</dfn>' in line.decode('utf-8') for place in places):
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 0: unexpected end of data
What I am doing wrong?
How about this:
if filename == '-':
input_file = sys.stdin
else:
input_file = open(filename, 'rb')
tree = etree.fromstring(input_file.read())
...
I think it's likely the XML source is utf-8 (whether it comes from stdin OR from a file)

IOError: [Errno 2] No such file or directory Python

I have this piece of code, trying to find *.vm files, and send them to another
module i did, which supposed to read the lines.
this is the main file:
def VMTranslte(fileName):
print "FILEOVER ",fileName
from parser import Parser
from codeWriter import CodeWriter
if (fileName[-3:] == ".vm"):
outputFile = fileName[:-3]+".asm"
myWrite = CodeWriter(outputFile)
myWrite.setFileName(fileName)
myParser = Parser(fileName)
myWrite.setFileName(fileName);
translate(myParser,myWrite)
else:
if fileName[-1:] == "/": <===== CHECKS FOR DIRECTORY
mystr = fileName.split('/')[-2]
mystr = mystr.split('.')[0]+".asm"
outputFile = fileName+mystr
else:
outputFile = fileName+".asm"
myWrite = CodeWriter(outputFile)
for child in os.listdir(fileName):
if child.endswith('.vm'): <===== CHECK IF THERE IS *.vm FILE
print "CHILD: ",child <===== PRINTS THE FILE WANTED (MEANING FINDS IT)
myWrite.setFileName(child);
myParser = Parser(child) <===== CALLS THE READER MODULE DESCRIBED AT THE BOTTOM
translate(myParser,myWrite)
myWrite.close()
the module which supposed to read the lines:
#Constructor for Parser module.
def __init__(self,fileName):
import re
self.cmds = []
self.counter = 0
myFile = open(fileName, 'r') <=====ERROR OVER HERE
fLines = myFile.readlines()
for value in fLines :
lineStrip = value.strip()
if not (re.match("//",lineStrip) or len(lineStrip)==0):
self.cmds.append(lineStrip)
the error is:
File "/Users/***/Desktop/dProj7/parser.py", line 19, in __init__
myFile = open(fileName, 'r')
IOError: [Errno 2] No such file or directory: 'BasicTest.vm'
it is clear that the script finds the file, (he goes in the first loop),
what is going on over here?
os.listdir does not include the path, only the name of the file. You probably want to call Parser with os.path.join(fileName, child) as the argument.

Categories