Need clear explanation about how to download a file in python [closed]

Need clear explanation about how to download a file in python [closed] - python

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
I needed to download a file within a python program, someone told me to do this.
source = urllib2.urlopen("http://someUrl.com/somePage.html").read()
open("/path/to/someFile", "wb").write(source)
It working very well, but I would like to understand the code.
When you have something like
patatoe = 1
Isn't a variable?
and when you have a something like:
blabla()
isn't to define a function?
Please, I would LOVE to understand correctly the code.

The word "source" is a variable. When you call urllib2's urlopen method and pass it a URL, it will open that url. You could then type "source.read()" to read the web page (i.e. download it). In your example, it's combined into one line. See http://docs.python.org/library/urllib2.html
The second piece opens a file. The first argument is the path to the file. The "wb" part means that it will write in binary mode. If the file already exists, it will be overwritten. Normally, I would write it like this:
f = open("/path/to/someFile", "wb")
f.write(source)
f.close()
The way you're doing it is a shortcut. When that code is run and the script ends, the file is closed automatically. See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files

You define a function using the def keyword:
def f():
...
Without it, you are simply calling the function. open(...) returns a file object. which you then use to write the data out. It's practically the same as this:
f = open(...)
f.write(source)
It isn't quite the same, though, since the variable f holds onto the file object until it goes out of scope, whereas calling open(...).write(source) creates a temporary reference to the file object that disappears immediately after write() returns. The consequence of this is that the single-line form will immediately flush and close the file, while the two-line form wil keep the file open — and possibly some or all of the output buffered — until f goes out of scope.
You can observe this behaviour in the interactive shell:
>>> f = open('xxx', 'w')
>>> f.write('hello')
>>> open('yyy', 'w').write('world')
Now, without exiting the interactive shell, open another terminal window and check the contents of xxx and yyy. They'll both exist, but only yyy will have anything in it. Also, if you go back to Python and invoke f = None or del f, you'll find that xxx has now been written to.

The first line is assigning the result of downloading the file to the variable source. source is then written to disk.
To answer your broader points:
You're right that variables are assigned with an equals sign (=). What we're doing in that first line is assigning the variable source to whatever we receive from the URL.
Parentheses (()) are used to call functions which have been defined by def. To call a function means to ask the function to act. The things inside of the parentheses are called arguments.
You should start with Learn Python the Hard Way to get an understanding of what is happening.

Here's a (hopefully understandable) explanation of the code I showed you the other day (How to download a file in python - feel free to comment here or on that question if you need any more details / explanation):
# Open a local file called "someFile.html" for writing (like opening notepad.exe but not entering text yet)
out_file = open("/path/to/someFile.html", "wb")
# Connect to the server at someUrl.com and ask for "somePage.html" - the socket sends the "GET /somePage.html HTTP/1.1" request.
# This is like typing the url in your browser window and (if there were an option for it) only getting the headers but not the page content yet.
conn = urllib2.urlopen("http://someUrl.com/somePage.html")
# Read the contents of the remote file "somePage.html".
# This is what actually gets data from the web server and
# saves the data into the 'pageSource' variable.
pageSource = conn.read()
# Write the data we got from the web page to our local file that we opened earlier: 'someFile.html'
out_file.write(pageSource)

Related

Python: Json file become empty

Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong

I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.

As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.

When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')

how to delete a tempfile later

On python 2.7, I am currently using the following code to send data via a post request to a webpage (unfortunately, I cannot really change this). I prepare a string data which I prepare according to http://everydayscripting.blogspot.co.at/2009/09/python-jquery-open-browser-and-post.html, then write it to a file, and then open the file with webbrowser.open:
f = tempfile.NamedTemporaryFile(delete=False)
f.write(data)
f.close()
webbrowser.open(f.name)
time.sleep(1)
f.unlink(f.name)
However, I had to learn that sleeping a little sometimes is a little too little: I might delete the file before the data were submitted.
How can I avoid this?
One idea is, of course, to delete the file later, but when could this be? The whole thing is a method in a class - is there a method that is relieably executed on destruction? Or is it somehow possible to start the browser in a way so that it does not return, until the tab is closed?

Is there a more concise way to read csv files in Python?

with open(file, 'rb') as readerfile:
reader = csv.reader(readerfile)
In the above syntax, can I perform the first and second line together? It seems unnecessary to use 2 variables ('readerfile' and 'reader' above) if I only need to use the latter.
Is the former variable ('readerfile') ever used?
Can I use the same variable name for both is that bad form?

You can do:
reader = csv.reader(open(file, 'rb'))
but that would mean you are not closing your file explicitly.

with open(file, 'rb') as readerfile:
The first line opens the file and stores the file object in readerfile. The with statement ensures that the file is closed when you exit the block by any means, including exceptions.
reader = csv.reader(readerfile)
The second line creates a CSV reader object using the file object. It needs the file object (otherwise where would it read the data from?). Of course you could conceivably store it in the same variable
readerfile = csv.reader(readerfile)
if you wanted to (and don't plan on using the file object again), but this will likely lead to confusion for readers of your code.
Note that you haven't read anything yet! You still need to iterate over the reader object in order to get the data that you're interested in, and if you close the file before that happens then the reader object won't work. The file object is used behind the scenes by the reader object, even if you "hide" it by overwriting the readerfile variable.
Lastly, if you really want to do everything on one line, you could conceivably define a function that abstracts the with statement:
def with1(context, func):
with context as x:
return func(x)
Now you can write this as one line:
data = with1(open(file, 'rb'), lambda readerfile: list(csv.reader(readerfile)))
It's by no means clearer, however.

This is not recommended at all
Why is it important to use one line?
Most python programmers know well the benefits of using the with statement. Keep in mind that readers might be lazy (that is -read line by line-) on some cases. You want to be able to handle the file with the correct statement, ensuring the correct closing, even if errors arise.
Nevertheless, you can use a one liner for this, as stated in other answers:
reader = csv.reader(open(file, 'rb'))

So basically you want a one-liner?
reader = csv.reader(open(file, 'rb'))
As said before, the problem with that is with open() allows you to do the following steps in one time:
Open the file
Do what you want with the file (inside your open block)
Close the file (that is implicit and you don't have to specify it)
If you don't use with open but directly open, you file stays opened until the object is garbage collected, and that could lead to unpredicted behaviour in some cases.
Plus, your original code (two lines) is much more readable than a one-liner.

If you put them together, then the file won't be closed automatically -- but that often doesn't really matter, since it will be closed automatically when the script terminates.
It's not common to need to reference the raw file once acsv.readerinstance has been created from (except possibly to explicitly close it if you're not using awithstatement).
If you use the same variable name for both, it will probably work because thecsv.readerinstance will still hold a reference to the file object, so it won't be garbage collected until the program ends. It's not a commonly idiom, however.
Since csv files are often processed sequentially, the following can be a fairly concise way to do it since thecsv.readerinstance frequently doesn't really need to be given a variable name and it will close the file properly even if an exception occurs:
with open(file, 'rb') as readerfile:
for row in csv.reader(readerfile):
process the data...

Can I get File Modification Time from a file open for reading (python)

Do files opened like file("foo.txt") have any info about file modification time?
Basically I want to know if the file has been modified or replaced since a certain time, but if the file is replaced between checking modification time and opening the file, then you have inaccurate information.
How can I be sure?
Thanks.
UPDATE
#rubayeet: Thanks for the answer (+1), I actually didn't think of that. But... What to do if the modification time has changed? Perhaps I reload the file again. But what if it changes that time? If the file is being touched regularly I could end up in a loop forever! What I really want is a way to just get an open file handle and a modification time to go with it, without a potential infinite loop.
PS The answer you gave was actually plenty good enough for my purposes as the file won't be changed regularly, its general interest on my part now.
UPDATE 2
Thinking the previous update through (and experimenting a little) I realize that simply knowing the file modification time at the point the file was opened is not so much use as if the file is modified while reading you can have some or all of the modified data in the stuff you read in, so you'd have to open and read/process the whole file, then check mtime again (as per #rubayeet's answer) to see if you may have stale data.

For simple modtimes you would use:
from os.path import getmtime
modtime = getmtime('/file/to/path')
If you want something like a callback functionality you could check the inotify bindings for python: pyinotify.
You essentialy set a watchmanager up, which notifies you in a event-loop if any changes happens in the monitored directory. You register for specific events, like opening a file (which changes the modtime if written to).
If you are interested in an exclusive access to a file, i would point to the fnctl module, which has some lowlevel and file-locking mechanism on filedescriptors.

import os
filepath = '/path/to/file'
modifytime1 = os.path.getmtime(filepath)
fp = open(filepath)
modifytime2 = os.path.getmtime(filepath)
if modifytime1 != modifytime2:
print "File modified after opening"

Closing a file in python opened with a shortcut

I am just beginning with python with lpthw and had a specific question for closing a file.
I can open a file with:
input = open(from_file)
indata = input.read()
#Do something
indata.close()
However, if I try to simplify the code into a single line:
indata = open(from_file).read()
How do I close the file I opened, or is it already automatically closed?
Thanks in advance for the help!

You simply have to use more than one line; however, a more pythonic way to do it would be:
with open(path_to_file, 'r') as f:
contents = f.read()
Note that with what you are doing before, you could miss closing the file if an exception was thrown. The 'with' statement here will cause it be closed even if an exception is propagated out of the 'with' block.

Files are automatically closed when the relevant variable is no longer referenced. It is taken care of by Python garbage collection.
In this case, the call to open() creates a File object, of which the read() method is run. After the method is executed, no reference to it exists and it is closed (at least by the end of script execution).
Although this works, it is not good practice. It is always better to explicitly close a file, or (even better) to follow the with suggestion of the other answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Need clear explanation about how to download a file in python [closed] - python

Related

Python: Json file become empty

how to delete a tempfile later

Is there a more concise way to read csv files in Python?

Can I get File Modification Time from a file open for reading (python)

Closing a file in python opened with a shortcut

Categories

Resources