What is the most pythonic way to open a file? - python

I'm trying to clean up my code a little bit, and I have trouble figuring which of these 2 ways is considered the most pythonic one
import os
dir = os.path.dirname(__file__)
str1 = 'filename.txt'
f = open(os.path.join(dir,str1),'r')
Although the second seems to be cleanest one, I find the declaration of fullPath a bit too much, since it will only be used once.
import os
dir = os.path.dirname(__file__)
str1 = 'filename.txt'
fullPath = os.path.join(dir,str1)
f = open(fullPath,'r')
In general, is it a better thing to avoid calling functions inside of another call, even if it adds a line of code ?

with open('file path', 'a') as f:
data = f.read()
#do something with data
or
f = open(os.path.join(dir,str1),'r')
f.close()

file = open('newfile.txt', 'r')
for line in file:
print line
OR
lines = [line for line in open('filename')]
If file is huge, read() is definitively bad idea, as it loads (without size parameter), whole file into memory.
If your file is huge this will cause latency !
So, i don't recommend read() or readlines()

There are many ways to open files in python which goes to say that there really isn't really a pythonic way of doing it. It all just boils down to which method you see are most connivence, especially in regards to what you're actually trying to do with the file once its open.
Most users use the IDLE GUI "click" to open files because it allows them to view the current file and also make some alterations if there's a need for such.
Others might just rely on the command lines to perform the task, at the cost of not being able to do anything other than opening the file.
Using Command Lines:
% python myfile.py
note that in order for this to work you need to make sure the system is "looking" into the directory where your file is storied. Using the 'cd' is useful to finding you route there.
% python import myfile myfile.title
This method is known as the object.attribute method of opening files. This method is useful when the file you're opening has an operation that you would like to implement.
There are more ways than what's been stated above, be sure to consult the pyDocs for further details.

Related

Fastest Method to read many files line by line in Python

I have a conceptual question. I am new to Python and I am looking to do task that involves processing bigger log files. Some of these can get up to 5 and 6GB
I need to parse through many files in a location. These are text files.
I know of the with open() method, and recently just ran into pathlib. So I need to not only read the file line by line to extract values to upload into a DB, i also need to get file properties that Pathlib gives you and upload them as well.
Is it faster to use with open and underneath it, call a path object from which to read files... something like this:
for filename in glob('**/*.*', recursive=False):
fpath = Path(filename)
with open(filename, 'rb', buffering=102400) as logfile:
for line in logfile:
#regex operation
print(line)
Or would it be better to use Pathlib:
with Path("src/module.py") as f:
contents = open(f, "r")
for line in contents:
#regex operation
print(line)
Also since I've never used Pathlib to open files for reading. When it comes to this: Path.open(mode=’r’, buffering=-1, encoding=None, errors=None, newline=None)
What does newline and errors mean? I assume buffering here is the same as buffering in the with open function?
I also saw this contraption that uses with open in conjuction with Path object though how it works, I have no idea:
path = Path('.editorconfig')
with open(path, mode='wt') as config:
config.write('# config goes here')
pathlib is intended to be a more elegant solution to interacting with the file system, but it's not necessary. It'll add a small amount of fixed overhead (since it wraps other lower level APIs), but shouldn't change how performance scales in any meaningful way.
Since, as noted, pathlib is largely a wrapper around lower level APIs, you should know Path.open is implemented in terms of open, and the arguments all mean the same thing for both; reading the docs for the built-in open will describe the arguments.
As for the last bit of your question (passing a Path object to the built-in open), that works because most file-related APIs were updated to support any object that implements the os.PathLike ABC.

Is there a way to quickly create many files many sequential files (ex1 . . . ex 50) in Python?

I'm doing the exercises in Learn Python the Hard Way! and I give them names like exercise1.py, excercise2.py and they go up to 50. I'm using Sublime Text and Cygwin on Windows.
Conceptually the computer can do lots of repetitive things really fast. Is there a way make a thing that would create a bunch of files like excercise1.py up to 50 and or open all of them up so I don't have to waste time in between the exercises manually creating and opening blank pages?
Sorry I'm kind of overwhelmed and don't have the necessary lexicon to describe what I want to do briefly/ in technical terms.
Think about how you'd normally open a file:
file = open('exercise.py', 'w')
# do stuff to file, or just nothing
file.close()
Now we want to do this 50 times:
for i in range(50):
file = open('exercise.py', 'w')
file.close()
But each time we should replace 'exercise.py' with a different string. Namely, 'exercise1.py', 'exercise2.py', etc.
for i in range(50):
filename = # some expression involving i
file = open(filename, 'w')
file.close()
Can you think of what should come after filename =? A basic answer could involve the + operator for concatenating strings (ie, gluing them together back-to-back), and the str function that can turn a number to a string. A slightly more sophisticated answer might involve the % operator for string formatting or the str.format method
This will quickly solve your problem
for each in range(1,51):
f = open("exercise"+str(each)+".py","w")
f.close()
You're using Cygwin so why not just use bash?
for i in {1..50}
do
touch "exercise$i.py"
done

Python securely remove file

How can I securely remove a file using python? The function os.remove(path) only removes the directory entry, but I want to securely remove the file, similar to the apple feature called "Secure Empty Trash" that randomly overwrites the file.
What function securely removes a file using this method?
You can use srm to securely remove files. You can use Python's os.system() function to call srm.
You can very easily write a function in Python to overwrite a file with random data, even repeatedly, then delete it. Something like this:
import os
def secure_delete(path, passes=1):
with open(path, "ba+") as delfile:
length = delfile.tell()
with open(path, "br+") as delfile:
for i in range(passes):
delfile.seek(0)
delfile.write(os.urandom(length))
os.remove(path)
Shelling out to srm is likely to be faster, however.
You can use srm, sure, you can always easily implement it in Python. Refer to wikipedia for the data to overwrite the file content with. Observe that depending on actual storage technology, data patterns may be quite different. Furthermore, if you file is located on a log-structured file system or even on a file system with copy-on-write optimisation, like btrfs, your goal may be unachievable from user space.
After you are done mashing up the disk area that was used to store the file, remove the file handle with os.remove().
If you also want to erase any trace of the file name, you can try to allocate and reallocate a whole bunch of randomly named files in the same directory, though depending on directory inode structure (linear, btree, hash, etc.) it may very tough to guarantee you actually overwrote the old file name.
So at least in Python 3 using #kindall's solution I only got it to append. Meaning the entire contents of the file were still intact and every pass just added to the overall size of the file. So it ended up being [Original Contents][Random Data of that Size][Random Data of that Size][Random Data of that Size] which is not the desired effect obviously.
This trickery worked for me though. I open the file in append to find the length, then reopen in r+ so that I can seek to the beginning (in append mode it seems like what caused the undesired effect is that it was not actually possible to seek to 0)
So check this out:
def secure_delete(path, passes=3):
with open(path, "ba+", buffering=0) as delfile:
length = delfile.tell()
delfile.close()
with open(path, "br+", buffering=0) as delfile:
#print("Length of file:%s" % length)
for i in range(passes):
delfile.seek(0,0)
delfile.write(os.urandom(length))
#wait = input("Pass %s Complete" % i)
#wait = input("All %s Passes Complete" % passes)
delfile.seek(0)
for x in range(length):
delfile.write(b'\x00')
#wait = input("Final Zero Pass Complete")
os.remove(path) #So note here that the TRUE shred actually renames to file to all zeros with the length of the filename considered to thwart metadata filename collection, here I didn't really care to implement
Un-comment the prompts to check the file after each pass, this looked good when I tested it with the caveat that the filename is not shredded like the real shred -zu does
The answers implementing a manual solution did not work for me. My solution is as follows, it seems to work okay.
import os
def secure_delete(path, passes=1):
length = os.path.getsize(path)
with open(path, "br+", buffering=-1) as f:
for i in range(passes):
f.seek(0)
f.write(os.urandom(length))
f.close()

file won't write in python

I'm trying to replace a string in all the files within the current directory. for some reason, my temp file ends up blank. It seems my .write isn't working because the secondfile was declared outside its scope maybe? I'm new to python, so still climbing the learning curve...thanks!
edit: I'm aware my tempfile isn't being copied currently. I'm also aware there are much more efficient ways of doing this. I'm doing it this way for practice. If someone could answer specifically why the .write method fails to work here, that would be great. Thanks!
import os
import shutil
for filename in os.listdir("."):
file1 = open(filename,'r')
secondfile = open("temp.out",'w')
print filename
for line in file1:
line2 = line.replace('mrddb2.','shpdb2.')
line3 = line2.replace('MRDDB2.','SHPDB2.')
secondfile.write(line3)
print 'file copy in progress'
file1.close()
secondfile.close()
Just glancing at the thing, it appears that your problem is with the 'w'.
It looks like you keep overwriting, not appending.
So you're basically looping through the file(s),
and by the end you've only copied the last file to your temp file.
You'll may want to open the file with 'a' instead of 'w'.
Your code (correctly indented, though I don't think there's a way to indent it so it runs but doesn't work right) actually seems right. Keep in mind, temp.out will be the replaced contents of only the last source file. Could it be that file is just blank?
Firstly,
you have forgotten to copy the temp file back onto the original.
Secondly:
use sed -i or perl -i instead of python.
For instance:
perl -i -pe 's/mrddb2/shpdb2/;s/MRDDB2/SHPDB2/' *
I don't have the exact answer for you, but what might help is to stick some print lines in there in strategic places, like print each line before it was modified, then again after it was modified. Then place another one after the line was modified just before it is written to the file. Then just before you close the new file do a:
print secondfile.read()
You could also try to limit the results you get if there are too many for debugging purposes. You can limit string output by attaching a subscript modifier to the end, for example:
print secondfile.read()[:n]
If n = 100 it will limit the output to 100 characters.
if your code is actually indented as showed in the post, the write is working fine. But if it is failing, the write call may be outside the inner for loop.
Just to make sure I wasn't really missing something, I tested the code and it worked fine for me. Maybe you could try continue for everything but one specific filename and then check the contents of temp.out after that.
import os
for filename in os.listdir("."):
if filename != 'findme.txt': continue
print 'Processing', filename
file1 = open(filename,'r')
secondfile = open("temp.out",'w')
print filename
for line in file1:
line2 = line.replace('mrddb2.','shpdb2.')
line3 = line2.replace('MRDDB2.','SHPDB2.')
print 'About to write:', line3
secondfile.write(line3)
print 'Done with', filename
file1.close()
secondfile.close()
Also, as others have mentioned, you're just clobbering your temp.out file each time you process a new file. You've also imported shutil without actually doing anything with it. Are you forgetting to copy temp.out back to your original file?
I noticed sometimes it will not print to file if you don't have a file.close after file.write.
For example, this program never actually saves to file, it just makes a blank file (unless you add outfile.close() right after the outfile.write.)
outfile=open("ok.txt","w")
fc="filecontents"
outfile.write(fc.encode("utf-8"))
while 1:
print "working..."
#OP, you might also want to try fileinput module ( this way, you don't have to use your own temp file)
import fileinput
for filename in os.listdir("."):
for line in fileinput.FileInput(filename,inplace=1):
line = line.strip().replace('mrddb2.','shpdb2.')
line = line.strip().replace('MRDDB2.','SHPDB2.')
print line
set "inplace" to 1 for editing the file in place. Set to 0 for normal print to stdout

How to know when to manage resources in Python

I hope I framed the question right. I am trying to force myself to be a better programmer. By better I mean efficient. I want to write a program to identify the files in a directory and read each file for further processing. After some shuffling I got to this:
for file in os.listdir(dir):
y=open(dir+'\\'+file,'r').readlines()
for line in y:
pass
y.close()
It should be no surprise that I get an AttributeError since y is a list. I didn't think about that when I wrote the snippet.
I am thinking about this and am afraid that I have five open files (there are five files in the directory specified by dir.
I can fix the code so it runs and I explicitly close the files after opening them. I am curious if I need to or if Python handles closing the file in the next iteration of the loop. If so then I only need to write:
for file in os.listdir(dir):
y=open(dir+'\\'+file,'r').readlines()
for line in y:
pass
I am guessing that it(python) does handle this effortlessly. The reason I think that this might be handled is that I have changed the object/thing that y is referencing. When I start the second iteration there are no more memory references to the file that was opened and read using the readlines method.
Python will close open files when they get garbage-collected, so generally you can forget about it -- particularly when reading.
That said, if you want to close explicitely, you could do this:
for file in os.listdir(dir):
f = open(dir+'\\'+file,'r')
y = f.readlines()
for line in y:
pass
f.close()
However, we can immediately improve this, because in python you can iterate over file-like objects directly:
for file in os.listdir(dir):
y = open(dir+'\\'+file,'r')
for line in y:
pass
y.close()
Finally, in recent python, there is the 'with' statement:
for file in os.listdir(dir):
with open(dir+'\\'+file,'r') as y:
for line in y:
pass
When the with block ends, python will close the file for you and clean it up.
(you also might want to look into os.path for more pythonic tools for manipulating file names and directories)
Don't worry about it. Python's garbage collector is good, and I've never had a problem with not closing file-pointers (for read operations at least)
If you did want to explicitly close the file, just store the open() in one variable, then call readlines() on that, for example..
f = open("thefile.txt")
all_lines = f.readlines()
f.close()
Or, you can use the with statement, which was added in Python 2.5 as a from __future__ import, and "properly" added in Python 2.6:
from __future__ import with_statement # for python 2.5, not required for >2.6
with open("thefile.txt") as f:
print f.readlines()
# or
the_file = open("thefile.txt")
with the_file as f:
print f.readlines()
The file will automatically be closed at the end of the block.
..but, there are other more important things to worry about in the snippets you posted, mostly stylistic things.
Firstly, try to avoid manually constructing paths using string-concatenation. The os.path module contains lots of methods to do this, in a more reliable, cross-platform manner.
import os
y = open(os.path.join(dir, file), 'r')
Also, you are using two variable names, dir and file - both of which are built-in functions. Pylint is a good tool to spot things like this, in this case it would give the warning:
[W0622] Redefining built-in 'file'

Categories