Read files with fileinput.input() from X line - python

here my question, it is probably not very complex but I am learning Python. I'm trying to read multiple files (all of them with the same format), at the same time a have to begin reading them from line 32, somehow I don't find the most efficient way to do so.
Here my code until now:
for file in fileinput.input():
entries = [f.strip().split("\t") for f in file].readlines()[32:]
which gives the error: AttributeError: 'list' object has no attribute 'readlines'
I know another possibility would be:
sources = open(sys.argv[1], "r").readlines()[32:]
and then just on the command line python3.2 script.py data/*.csv. But this seems not to work properly.
I am thanked for any help.

You can use openhook argument.
According to the module documentation:
You can control how files are opened by providing an opening hook via
the openhook parameter to fileinput.input() or FileInput(). The hook
must be a function that takes two arguments, filename and mode, and
returns an accordingly opened file-like object. Two useful hooks are
already provided by this module.
import fileinput
def skip32(filename, mode):
f = open(filename, mode)
for i in range(32):
f.readline()
return f
entries = [line.strip().split('\t') for line in fileinput.input(openhook=skip32)]
BTW, the last line can be replaced with (using csv module):
import csv
entries = list(csv.reader(fileinput.input(openhook=skip32), delimiter='\t'))

It's just a little syntax
entries = [f.strip().split("\t") for f in file].readlines()[32:]
should be:
entries = [f.strip().split("\t") for f in file.readlines()][32:]

Related

Use of python close command (LPTHW ex 17 extra credit) [duplicate]

I am having a great time trying to figure out why there doesn't need to be a closing attribute for this few lines of code I wrote:
from sys import argv
from os.path import exists
script, from_file, to_file = argv
file_content = open(from_file).read()
new_file = open(to_file, 'w').write(file_content)
new_file.close()
file_content.close()
I read some things and other people's posts about this, but their scripts were a lot more complicated than what I'm currently learning, so I couldn't figure out why.
I am doing Learning Python the Hard Way and would appreciate any help.
file_content is a string variable, which contains contents of the file -- it has no relation to the file. The file descriptor you open with open(from_file) will be closed automatically: file sessions are closed after the file-objects exit the scope (in this case, immediately after .read()).
open(...) returns a reference to a file object, calling read on that reads the file returning a string object, calling write writes to it returning None, neither of which have a close attribute.
>>> help(open)
Help on built-in function open in module __builtin__:
open(...)
open(name[, mode[, buffering]]) -> file object
Open a file using the file() type, returns a file object. This is the
preferred way to open a file.
>>> a = open('a', 'w')
>>> help(a.read)
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
>>> help(a.write)
Help on built-in function write:
write(...)
write(str) -> None. Write string str to file.
Note that due to buffering, flush() or close() may be needed before
the file on disk reflects the data written.
Theres a couple ways of remedying this:
>>> file = open(from_file)
>>> content = file.read()
>>> file.close()
or with python >= 2.5
>>> with open(from_file) as f:
... content = f.read()
The with will make sure the file is closed.
When you do file_content = open(from_file).read(), you set file_content to the contents of the file (as read by read). You can't close this string. You need to save the file object separately from its contents, something like:
theFile = open(from_file)
file_content = theFile.read()
# do whatever you need to do
theFile.close()
You have a similar problem with new_file. You should separate the open(to_file) call from the write.

Reading command Line Args

I am running a script in python like this from the prompt:
python gp.py /home/cdn/test.in..........
Inside the script i need to take the path of the input file test.in and the script should read and print from the file content. This is the code which was working fine. But the file path is hard coded in script. Now I want to call the path as a command line argument.
Working Script
#!/usr/bin/python
import sys
inputfile='home/cdn/test.in'
f = open (inputfile,"r")
data = f.read()
print data
f.close()
Script Not Working
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
data = argv[1:].read()
print data
f.close()
What change do I need to make in this ?
While Brandon's answer is a useful solution, the reason your code is not working also deserves explanation.
In short, a list of strings is not a file object. In your first script, you open a file and operate on that object (which is a file object.). But writing ['foo','bar'].read() does not make any kind of sense -- lists aren't read()able, nor are strings -- 'foo'.read() is clearly nonsense. It would be similar to just writing inputfile.read() in your first script.
To make things explicit, here is an example of getting all of the content from all of the files specified on the commandline. This does not use fileinput, so you can see exactly what actually happens.
# iterate over the filenames passed on the commandline
for filename in sys.argv[1:]:
# open the file, assigning the file-object to the variable 'f'
with open(filename, 'r') as f:
# print the content of this file.
print f.read()
# Done.
Check out the fileinput module: it interprets command line arguments as filenames and hands you the resulting data in a single step!
http://docs.python.org/2/library/fileinput.html
For example:
import fileinput
for line in fileinput.input():
print line
In the script that isn't working for you, you are simply not opening the file before reading it. So change it to
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
f = open(argv[1:], "r")
data = f.read()
print data
f.close()
Also, f.close() this would error out because f has not been defined. The above changes take care of it though.
BTW, you should use at least 3 chars long variable names according to the coding standards.

python: open two files as one fileobject

I have two files: a header and the body. I am using a library to read the whole thing. I can use "fileinput.input" to create one FileInput object and hand this to the library that reads the data. Problem is FileInput objects do not have a '.read' attribute which the library seems to expect.
I need a FileObject with a .read that is like reading both files as one.
Any ideas existing workarounds? Yes, I know I can build my own little class or cat files together. Just wondering if there is some magic FileObject joiner I've never heard of.
If your library reads from a file with .read(), there isn't much point in some abstraction of merging multiple file-objects as one. it is quite trivial to read everything and throw it into StringIO.
if you just want to call readline() on the files, try this:
def cat(*args):
for arg in args:
with open(arg,'r') as f:
for line in f:
yield line
for line in cat('/tmp/x1','/tmp/x2'):
processLine(line)
Your file objects are already iterable via generators, so just use itertools to chain them into one big iterable.
import itertools
all_the_things = itertools.chain(HeaderFile, BodyFile)
for line in all_the_things:
# your code here

Python - 'str' object has no attribute 'close'

I am having a great time trying to figure out why there doesn't need to be a closing attribute for this few lines of code I wrote:
from sys import argv
from os.path import exists
script, from_file, to_file = argv
file_content = open(from_file).read()
new_file = open(to_file, 'w').write(file_content)
new_file.close()
file_content.close()
I read some things and other people's posts about this, but their scripts were a lot more complicated than what I'm currently learning, so I couldn't figure out why.
I am doing Learning Python the Hard Way and would appreciate any help.
file_content is a string variable, which contains contents of the file -- it has no relation to the file. The file descriptor you open with open(from_file) will be closed automatically: file sessions are closed after the file-objects exit the scope (in this case, immediately after .read()).
open(...) returns a reference to a file object, calling read on that reads the file returning a string object, calling write writes to it returning None, neither of which have a close attribute.
>>> help(open)
Help on built-in function open in module __builtin__:
open(...)
open(name[, mode[, buffering]]) -> file object
Open a file using the file() type, returns a file object. This is the
preferred way to open a file.
>>> a = open('a', 'w')
>>> help(a.read)
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
>>> help(a.write)
Help on built-in function write:
write(...)
write(str) -> None. Write string str to file.
Note that due to buffering, flush() or close() may be needed before
the file on disk reflects the data written.
Theres a couple ways of remedying this:
>>> file = open(from_file)
>>> content = file.read()
>>> file.close()
or with python >= 2.5
>>> with open(from_file) as f:
... content = f.read()
The with will make sure the file is closed.
When you do file_content = open(from_file).read(), you set file_content to the contents of the file (as read by read). You can't close this string. You need to save the file object separately from its contents, something like:
theFile = open(from_file)
file_content = theFile.read()
# do whatever you need to do
theFile.close()
You have a similar problem with new_file. You should separate the open(to_file) call from the write.

Python open() with minimal fluff variables

The intent is to look in a json file in the directory above the script and load up what it finds in that file. This is what I've got:
import os
import json
settings_file = '/home/me/foo/bar.txt'
root = os.path.dirname(os.path.dirname(os.path.abspath(settings_file))) # '/home/me'
target = os.path.join(root,'.extras.txt') # '/home/me/.extras.txt'
db_file= open(target)
databases = json.load(db_file) # works, returns object
databases2 = json.load(open(target)) # equivalent to above, also works
# try to condense code, lose pointless variables target and file
databases3 = json.load(open(os.path.join(root,'.extras.txt'))) # equivalent (I thought!) to above, doesn't work.
So... why doesn't the all-at-once, no holding variables version work? Oh, the error returned is (now in it's entirety):
$ ./json_test.py
Traceback (most recent call last):
File "./json_test.py", line 69, in <module>
databases = json.load(open(os.path.join(root,'/.extras.txt')))
IOError: [Errno 2] No such file or directory: '/.extras.txt'
And to satisfy S.Lott's well-intentioned advice... it doesn't matter what target is set to. The databases and databases2 populate correctly while databases3 does not. target exists, is readable and contains what json expects to see. I suspect there's something I don't understand about the nature of stringing commands together... I can make the code work, was just wondering why the concise (or complex?) version failed.
Code looks fine, make sure referenced files are in the appropriate places. Given your code that includes target/file variable assignment, full path to .extras.txt is
/home/me/.extras.txt
You need to do:
file = open(target, 'w')
because by default open will try to open the file in read mode (r) but you need to open it in w (write) mode if you want it to be created.
Also, I would not use the variable name file since it is also a type (<type 'file'>) in python.
You could add the write-mode flag to this line as well:
databases = json.load(open(os.path.join(root,'.extras.txt'), 'w'))
because from the limited information we have in the question it appears your /.extras file does not previously exist.
Final note, you are losing the handle to your open file in this line (since you are not storing it in your file variable):
databases = json.load(open(os.path.join(root,'.extras.txt')))
How do you intend to close the file when you're finished with it?
You could do this with a context manager (python >=2.6 or 2.5 if import with_statement used):
with open(os.path.join(root,'.extras.txt'), 'w') as f:
databases = json.load(f)
which will take care of closing the file for you.

Categories