This question isn't really about Python but about Python in particular. Say, I've read a file:
f = open(file_name, "rb")
body = f.read()
f.close()
Since the file is already read in the variable and closed, does it matter in terms of performance whether I use the variable body within the same method or pass around to another method? Is there enough information to answer my question?
Python doesn't create copies of objects passed to functions (including strings)
def getidof(s):
return id(s)
s = 'blabla'
id(s) == getidof(s) # True
So even passing a huge string doesn't affect performances, of course you will have a slight overhead because you called a function, but the type of the argument and its length doesn't matter.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a function that gets the user's preferred directory for use in many other functions:
def get_pref():
q1 = "What is your pref. dir.?"
path = input(q1)
...
return path
Then, the path is used to specify locations in numerous functions (only 2 are shown here.)
Option A:
def do_stuff1():
thing = path + file_name # using global path variable
...
def do_stuff2():
thing = path + file_name # using global path variable
...
path = get_pref()
do_stuff1()
do_stuff2()
Option B:
def do_stuff1(path_):
thing = path_ + file_name # accepts path as argument
...
def do_stuff2(path_):
thing = path_ + file_name # accepts path as argument
...
path = get_pref()
do_stuff1(path)
do_stuff2(path)
Option A accesses the global variable path in each function. Option B accepts the path as an argument in each function.
B seems to be repetitive since the same variable is passed each time, but I know globals are strongly discouraged in Python. Would it be acceptable to declare the path as a global constant, or is there a more conventional way?
If you are writing a short script, you shouldn't be afraid to use globals. Globals are discouraged due to the namespace pollution and lack of modularity. However, if you have a short script, they will not be a significant impact to maintainability.
However, if you are producing a larger module, consider making a class out of your related functions and maintain the path as an instance variable. You may even consider passing the path into the constructor of your instance variable to make it clear to other engineers that the functionality of the class depends heavily on the path value. A setter and getter would also be recommended for this class attribute.
I prefer B) because I generally find it makes code easier to read and reason about.
You're right though. Strict adherence to this means that every dependency that a function needs must be not only passed as arguments to that function, but must also be passed to every parent function up the call chain. This can complicate code; even if it does make clearer the "pipes of information" that are involved.
In this case though, I think A) is acceptable for a couple reasons:
The data involved is immutable, so you know it can't change out from under you at odd times.
It seems like you're also only assigning it once, so you don't need to worry later about the reference changing either.
The key that I've always kept in mind is: globals are not inherently evil. Global, mutable states can be though because it's background information that you can't easily see, but potentially affects the operations of tests and other functions. As long as global states aren't changing (either as a result of mutating the object, or the reference itself through reassignment), I personally don't have any issue with globals.
If you have data and functionality which necessarily interoperate, consider wrapping it in a class. The "path" can be stored as a variable in the class.
import os
class MyClass(object):
def __init__(self, path):
self.path = path
def my_func(self):
thing = os.path.join(path, "foo")
You can use a class to store your "global" like this, which is like a compromise between A and B.
# dc = data container
class dc():
def __init__(self):
self.path = "aaa-"
def do_stuff1(dc_object, file_name):
dc_object.path = dc_object.path + file_name
Obj = dc()
do_stuff1(Obj, "test")
print(Obj.path)
#out: aaa-test
As a beginning programmer, you should avoid globals as much as possible, until you gain the experience to know when they're acceptable.
Option B is clean code, you should pass it as an argument to a function. And no, it is no repetitive code :)
Let's say I have a code like this:
def read_from_file(filename):
list = []
for i in filename:
value = i[0]
list.append(value)
return list
def other_function(other_filename):
"""
That's where my question comes in. How can I get the list
from the other function if I do not know the value "filename" will get?
I would like to use the "list" in this function
"""
read_from_file("apples.txt")
other_function("pears.txt")
I'm aware that this code might not work or might not be perfect. But the only thing I need is the answer to my question in the code.
You have two general options. You can make your list a global variable that all functions can access (usually this is not the right way), or you can pass it to other_function (the right way). So
def other_function(other_filename, anylist):
pass # your code here
somelist = read_from_file("apples.txt")
other_function("pears.txt.", somelist)
You need to "catch" the value return from the first function, and then pass that to the second function.
file_name = read_from_file('apples.txt')
other_function(file_name)
You need to store the returned value in a variable before you can pass it onto another function.
a = read_from_file("apples.txt")
There are at least three reasonable ways to achieve this and two which a beginner will probably never need:
Store the returned value of read_from_file and give it as a parameter to other_function (so adjust the signature to other_function(other_filename, whatever_list))
Make whatever_list a global variable.
Use an object and store whatever_list as a property of that object
(Use nested functions)
(Search for the value via garbage collector gc ;-)
)
Nested functions
def foo():
bla = "OK..."
def bar():
print(bla)
bar()
foo()
Global variables
What are the rules for local and global variables in Python? (official docs)
Global and Local Variables
Very short example
Misc
You should not use list as a variable name as you're overriding a built-in function.
You should use a descriptive name for your variables. What is the content of the list?
Using global variables can sometimes be avoided in a good way by creating objects. While I'm not always a fan of OOP, it sometimes is just what you need. Just have a look of one of the plenty tutorials (e.g. here), get familiar with it, figure out if it fits for your task. (And don't use it all the time just because you can. Python is not Java.)
I was wondering if there was any difference between doing:
var1 = open(filename, 'w').write("Hello world!")
and doing:
var1 = open(filename, 'w')
var1.write("Hello world!")
var1.close()
I find that there is no need (AttributeError) if I try to run close() after using the first method (all in one line).
I was wondering if one way was actually any different/'better' than the other, and secondly, what is Python actually doing here? I understand that open() returns a file object, but how come running all of the code in one line automatically closes the file too?
Using with statement is preferred way:
with open(filename, 'w') as f:
f.write("Hello world!")
It will ensure the file object is closed outside the with block.
Let me example to you why your first instance wont work if you initial a close() method. This will be useful for your future venture into learning object orientated programming in Python
Example 1
When you run open(filename, 'w') , it will initialise and return an file handle object.
When you call for open(filename, 'w').write('helloworld'), you are calling the write method on the file object that you initiated. Since the write method do not return any value/object, var1 in your code above will be of NoneType
Example 2
Now in your second example, you are storing the file object as var1.
var1 will have the write method as well as the close method and hence it will work.
This is in contrast to what you have done in your first example.
falsetru have provided a good example of how you can read and write file using the with statement
Reading and Writing file using the with statement
to write
with open(filename, 'w') as f:
f.write("helloworld")
to read
with open(filename) as f:
for line in f:
## do your stuff here
Using nested with statements to read/write multiple files at once
Hi here's an update to your question on the comments. Not too sure if this is the most pythonic way. But if you will like to use the with statement to read/write mulitple files at the same time using the with statement. What you can do is the nest the with statement within one another
For instance :
with open('a.txt', 'r') as a:
with open('b.txt', 'w') as b:
for line in a:
b.write(line)
How and Why
The file object itself is a iterator. Therefore, you could iterator thru the file with a for loop. The file object contains the next() method, which, with each iteration will be called until the end of file is reached.
The with statement was introduced in python 2.5. Prior to python 2.5 to achieve the same effect, one have to
f = open("hello.txt")
try:
for line in f:
print line,
finally:
f.close()
Now the with statement does that automatically for you. The try and finally statement are in place to ensure if there is any expection/error raised in the for loop, the file will be closed.
source : Python Built-in Documentation
Official documentations
Using the with statement, f.close() will be called automatically when it finishes. https://docs.python.org/2/tutorial/inputoutput.html
Happy venture into python
cheers,
biobirdman
#falsetru's answer is correct, in terms of telling you how you're "supposed" to open files. But you also asked what the difference was between the two approaches you tried, and why they do what they do.
The answer to those parts of your question is that the first approach doesn't do what you probably think it does. The following code
var1 = open(filename, 'w').write("Hello world!")
is roughly equivalent to
tmp = open(filename, 'w')
var1 = tmp.write("Hello world!")
del tmp
Notice that the open() function returns a file object, and that file object has a write() method. But write() doesn't have any return value, so var1 winds up being None. From the official documentation for file.write(str):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
Now, the reason you don't need to close() is that the main implementation of Python (the one found at python.org, also called CPython) happens to garbage-collect objects that no longer have references to them, and in your one-line version, you don't have any reference to the file object once the statement completes. You'll find that your multiline version also doesn't strictly need the close(), since all references will be cleaned up when the interpreter exits. But see answers to this question for a more detailed explanation about close() and why it's still a good idea to use it, unless you're using with instead.
Currently I am using argparse for parsing arguments in this fashion :
outputFile = ""
input
def getArguments():
parser = argparse.ArgumentParser(description='Execute the given pig script and pipe the output to given outFile.')
parser.add_argument('-o','--outputFile', help='Output file where pig file execution output is stored.', required=True)
input = parser.parse_args()
print ("========================================")
print ("Argument to the script")
print ("outputFile = %s" % input.outputFile )
return input
input = getArguments()
outputFile = input.outputFile
print ("outputFile = %s" % outputFile )
My question is, is there a way a better AND/OR more compact way of writing parsing in this way ?
Note : I am especially trying to look for binding of parsed argument to the variable in the file. I hope to not to use "input" variable every-time I access input-argument nor do I want to have explicit variable declared just to copy the parameters from the argument-string to a variable.
As #MartijinPieters points out in the comments, there is nothing wrong with options.arg1. Firstly, I want to echo this. There isn't a very clean (or clear IMHO) way of doing what you are asking for.
What you'll need to do is convert the options object into a dictionary. This can be done using vars:
opt_dict = vars(parser.parse_args())
Then you'll have to load all the values in the dictionary into the locals. I found this question that shows examples of how to do this. I like Ken's answer for its clarity:
for item in opt_dict.iteritems():
eval('{0} = {1}'.format(*item))
But it allows for the possibility of dangerous input sneaking in. Also, this will have to be done outside of the getArguments() function.
However, I agree that the accepted answer on the previously linked question is probably the best way of doing this, which is to use the opt_dict to update globals.
NOTE: If you are doing this for a Pig python wrapper, like I suspect, then loading the the variables into the locals will actually be counter productive. The variables need to be passed into bind as a dictionary.
This is a newbie question, but I looked around and I'm having trouble finding anything specific to this question (perhaps because it's too simple/obvious to others).
So, I am working through Zed Shaw's "Learn Python the Hard Way" and I am on exercise 15. This isn't my first exposure to python, but this time I'm really trying to understand it at a more fundamental level so I can really do something with a programming language for once. I should also warn that I don't have a good background in object oriented programming or fully internalized what objects, classes, etc. etc. are.
Anyway, here is the exercise. The ideas is to understand basic file opening and reading:
from sys import argv
script, filename = argv
txt = open(filename)
print "Here's your file %r:" % filename
print txt.read()
print "I'll also ask you to type it again:"
file_again = raw_input("> ")
txt_again = open(file_again)
print txt_again.read()
txt.close()
txt_again.close()
My question is, why are the open and read functions used differntly?
For example, to read the example file, why don't/can't I type print read(txt) on line 8?
Why do I put a period in front of the variable and the function after it?
Alternatively, why isn't line 5 written txt = filename.open()?
This is so confusing to me. Is it simply that some functions have one syntax and others another syntax? Or am I not understanding something with respect to how one passes variables to functions.
Syntax
Specifically to the syntactical differences: open() is a function, read() is an object method.
When you call the open() function, it returns an object (first txt, then txt_again).
txt is an object of class file. Objects of class file are defined with the method read(). So, in your code above:
txt = open(filename)
Calls the open() function and assigns an object of class file into txt.
Afterwards, the code:
txt.read()
calls the method read() that is associated with the object txt.
Objects
In this scenario, it's important to understand that objects are defined not only as data entities, but also with built-in actions against those entities.
e.g. A hypothetical object of class car might be defined with methods like start_engine(), stop_engine(), open_doors(), etc.
So as a parallel to your file example above, code for creating and using a car might be:
my_car = create_car(type_of_car)
my_car.start_engine()
(Wikipedia entry on OOP.)
To answer this you should have some understanding of object oriented programming.
open() is a normal function, and the first parameter is a string, with the path to the file. The return value of this function is an object.
The further work is done by using this object. An object also has functions, but they are called methods. These methods are called in the context of this object, and the point connects the object with the method. So txt.read() means that you are calling the read-method from the txt-object.
But if you really want to understand this, you should have a look at OOP.
You're coming up against methods vs functions.
open is a global function, and it takes as its parameters simply the things that go between the brackets.
read is a method of file objects. The expression txt.read() calls the read method of the txt object. Under the hood, the txt object is passed as the first parameter of its read method. The read method will be defined something like this:
class File(object):
def read(self):
# do whatever here
# self is whatever object appears to the left of the dot in foo.read
It follows from the above definition that you can only use a method like read on an object which has a read method defined for it.