How can I delay execution until after os.system finishes? - python

I am using os.system to copy a file from a system to another. The logic of a very simple program is to execute another set of commands after this file gets copied.
The problem is that os.system does not actually wait for the file to be copied, and gets to executing the next line. This causes issues to the system. I could actually give some wait functions, through time.sleep(), but we have to copy files with sizes ranging from 500 MB to sometimes 20 GB, and the times taken are very different.
What's the solution? I need to somehow tell my program that the files are copied, and then to execute the next line.

The first thing I'd try is to use shutil.copyfile() instead of an external program to copy the file. If you have to use an external program, you should call it via subprocess.Popen(), not via os.system(). You can use the Popen.wait() to wait for the subprocess to finish.

I think you should rather use shutil.copyfile than os.system to copy a file.
(Edit: woops, copy, not move)

use the shutil module for copying files.
The shutil module offers a number of
high-level operations on files and
collections of files. In particular,
functions are provided which support
file copying and removal.
also, use the subprocess module instead of os.system()
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
os.system
for a better answer, you need to provide more detail about what exactly you are trying to do and how (programmatically) you are stuck.

Related

how can I run python file from another file, then have the new file restart the first file?

So far I don't think this is actually possible, but basically what I am trying to do is have one python program call another and run it, like how you would use import.
But then I need to be able to go from the second file back to the beginning of the first.
Doing this with import doesn't work because the first program never closed and will be still running, so running it again will only return to where it left off when it ran the second file.
Without understanding a bit more about what you want to do, I would suggest looking into the threading or multiprocessing libraries. These should allow you to create multiple instances of a program or function.
This is vague and I'm not quite sure what you're trying to do, but you can also explore the Subprocess module for Python. It will allow you to spawn new processes similarly to if you were starting them from the command-line, and your processes will also be able to talk to the child processes via stdin and stdout.
If you don't want to import any modules:
exec("file.py")
Otherwise:
import os
os.system('file.py')
Or:
import subprocess
subprocess.call('file.py')

How to keep piping output of a program to a python script, if the program is restarted

I know i can read the output of another script in Python by e.g. calling some_program | print_input.py and using sys.stdin in print_input.py like this:
import sys
if __name__=='__main__':
while True:
print sys.stdin.read(1024)
But is it also possible to restart some_program and still get its output without restarting print_input.py?
The idea is that the script some_program may crash, so that i will have to restart it, without loosing the current state of print_input.py.
Additional info that might be needed:
Launching some_program from within print_input.py using e.g. subprocess is not an option unfortunately.
Low latency requirements, so no (long) blocking calls.
The output of some_program is massive.
I can't modify some_program.
The elegant/usual solution would be to use named pipes. Create a pipe using mkfifo , pipe the output of some_program to it and the python script can just read from the pipe. Both program can be restarted without issues.
Im not sure about performance but no IO should be involved (even thought the pipe seems to be a file).
Another possibility would be to create a temporary file in a tmpfs or ramfs filesystem, have some_program write to it, and the python script can just repeatedly try reading. But IMO this is strictly worse than using pipes..

Why use Python's os module methods instead of executing shell commands directly?

I am trying to understand what is the motivation behind using Python's library functions for executing OS-specific tasks such as creating files/directories, changing file attributes, etc. instead of just executing those commands via os.system() or subprocess.call()?
For example, why would I want to use os.chmod instead of doing os.system("chmod...")?
I understand that it is more "pythonic" to use Python's available library methods as much as possible instead of just executing shell commands directly. But, is there any other motivation behind doing this from a functionality point of view?
I am only talking about executing simple one-line shell commands here. When we need more control over the execution of the task, I understand that using subprocess module makes more sense, for example.
It's faster, os.system and subprocess.call create new processes which is unnecessary for something this simple. In fact, os.system and subprocess.call with the shell argument usually create at least two new processes: the first one being the shell, and the second one being the command that you're running (if it's not a shell built-in like test).
Some commands are useless in a separate process. For example, if you run os.spawn("cd dir/"), it will change the current working directory of the child process, but not of the Python process. You need to use os.chdir for that.
You don't have to worry about special characters interpreted by the shell. os.chmod(path, mode) will work no matter what the filename is, whereas os.spawn("chmod 777 " + path) will fail horribly if the filename is something like ; rm -rf ~. (Note that you can work around this if you use subprocess.call without the shell argument.)
You don't have to worry about filenames that begin with a dash. os.chmod("--quiet", mode) will change the permissions of the file named --quiet, but os.spawn("chmod 777 --quiet") will fail, as --quiet is interpreted as an argument. This is true even for subprocess.call(["chmod", "777", "--quiet"]).
You have fewer cross-platform and cross-shell concerns, as Python's standard library is supposed to deal with that for you. Does your system have chmod command? Is it installed? Does it support the parameters that you expect it to support? The os module will try to be as cross-platform as possible and documents when that it's not possible.
If the command you're running has output that you care about, you need to parse it, which is trickier than it sounds, as you may forget about corner-cases (filenames with spaces, tabs and newlines in them), even when you don't care about portability.
It is safer. To give you an idea here is an example script
import os
file = raw_input("Please enter a file: ")
os.system("chmod 777 " + file)
If the input from the user was test; rm -rf ~ this would then delete the home directory.
This is why it is safer to use the built in function.
Hence why you should use subprocess instead of system too.
There are four strong cases for preferring Python's more-specific methods in the os module over using os.system or the subprocess module when executing a command:
Redundancy - spawning another process is redundant and wastes time and resources.
Portability - Many of the methods in the os module are available in multiple platforms while many shell commands are os-specific.
Understanding the results - Spawning a process to execute arbitrary commands forces you to parse the results from the output and understand if and why a command has done something wrong.
Safety - A process can potentially execute any command it's given. This is a weak design and it can be avoided by using specific methods in the os module.
Redundancy (see redundant code):
You're actually executing a redundant "middle-man" on your way to the eventual system calls (chmod in your example). This middle man is a new process or sub-shell.
From os.system:
Execute the command (a string) in a subshell ...
And subprocess is just a module to spawn new processes.
You can do what you need without spawning these processes.
Portability (see source code portability):
The os module's aim is to provide generic operating-system services and it's description starts with:
This module provides a portable way of using operating system dependent functionality.
You can use os.listdir on both windows and unix. Trying to use os.system / subprocess for this functionality will force you to maintain two calls (for ls / dir) and check what operating system you're on. This is not as portable and will cause even more frustration later on (see Handling Output).
Understanding the command's results:
Suppose you want to list the files in a directory.
If you're using os.system("ls") / subprocess.call(['ls']), you can only get the process's output back, which is basically a big string with the file names.
How can you tell a file with a space in it's name from two files?
What if you have no permission to list the files?
How should you map the data to python objects?
These are only off the top of my head, and while there are solutions to these problems - why solve again a problem that was solved for you?
This is an example of following the Don't Repeat Yourself principle (Often reffered to as "DRY") by not repeating an implementation that already exists and is freely available for you.
Safety:
os.system and subprocess are powerful. It's good when you need this power, but it's dangerous when you don't. When you use os.listdir, you know it can not do anything else other then list files or raise an error. When you use os.system or subprocess to achieve the same behaviour you can potentially end up doing something you did not mean to do.
Injection Safety (see shell injection examples):
If you use input from the user as a new command you've basically given him a shell. This is much like SQL injection providing a shell in the DB for the user.
An example would be a command of the form:
# ... read some user input
os.system(user_input + " some continutation")
This can be easily exploited to run any arbitrary code using the input: NASTY COMMAND;# to create the eventual:
os.system("NASTY COMMAND; # some continuation")
There are many such commands that can put your system at risk.
For a simple reason - when you call a shell function, it creates a sub-shell which is destroyed after your command exists, so if you change directory in a shell - it does not affect your environment in Python.
Besides, creating sub-shell is time consuming, so using OS commands directly will impact your performance
EDIT
I had some timing tests running:
In [379]: %timeit os.chmod('Documents/recipes.txt', 0755)
10000 loops, best of 3: 215 us per loop
In [380]: %timeit os.system('chmod 0755 Documents/recipes.txt')
100 loops, best of 3: 2.47 ms per loop
In [382]: %timeit call(['chmod', '0755', 'Documents/recipes.txt'])
100 loops, best of 3: 2.93 ms per loop
Internal function runs more than 10 time faster
EDIT2
There may be cases when invoking external executable may yield better results than Python packages - I just remembered a mail sent by a colleague of mine that performance of gzip called through subprocess was much higher than the performance of a Python package he used. But certainly not when we are talking about standard OS packages emulating standard OS commands
Shell call are OS specific whereas Python os module functions are not, in most of the case. And it avoid spawning a subprocess.
It's far more efficient. The "shell" is just another OS binary which contains a lot of system calls. Why incur the overhead of creating the whole shell process just for that single system call?
The situation is even worse when you use os.system for something that's not a shell built-in. You start a shell process which in turn starts an executable which then (two processes away) makes the system call. At least subprocess would have removed the need for a shell intermediary process.
It's not specific to Python, this. systemd is such an improvement to Linux startup times for the same reason: it makes the necessary system calls itself instead of spawning a thousand shells.

How to get open files of a subprocess?

How to get open files of a subprocess?
i opened a subprocess which generate files, i want get file descritor of these files to do fsync on them
so if i have code like this:
p = subprocess.Popen([
'some_program'
])
the process p generate some files
i can get the process id of the subprocess using:
p.pid
but how can i get fd of these files to call flush and fsync() on them?
actually i find a utility called "lsof" (list open files)
but it is not installed or supported on my system, so i did not do further investigations on it, as i really need a standard way
thanks
Each process has its own table of file descriptors. If you know that a child process has a certain file open with FD 8 (which is easy enough, just take a listing of /proc/<pid>/fd), when you do fsync(8) you are sync'ing a file of your process, not the child's.
The same applies to all functions that use file descriptors: fread, fwrite, dup, close...
To get the effect of fsync, you might call sync instead.
What you could do instead is implement some kind of an RPC mechanism. For example you could add a signal handler that makes the child run fsync on all open FDs when it receives SIGUSR1.
If you want to use a packed solution, instead of going to /proc/pid/fd, an option is to use lsof of psutils
You can't fsync on behalf of another process. Also, you probably want flushing, not fsync. You can't flush on behalf of another process either. Rethink your requirements.

Is it safe to call os.unlink(__file__) in Python?

I'm using Python 2.6 on linux.
I have a run.py script which starts up multiple services in the background and generates kill.py to kill those processes.
Inside kill.py, is it safe to unlink itself when it's done its job?
import os
# kill services
os.unlink(__file__)
# is it safe to do something here?
I'm new to Python. My concern was that since Python is a scripting language, the whole script might not be in memory. After it's unlinked, there will be no further code to interpret.
I tried this small test.
import os
import time
time.sleep(10) # sleep 1
os.unlink(__file__)
time.sleep(10) # sleep 2
I ran stat kill.py when this file was being run and the number of links was always 1, so I guess the Python interpreter doesn't hold a link to the file.
As a higher level question, what's the usual way of creating a list of processes to be killed later easily?
Don't have your scripts write new scripts if you can avoid it – just write out a list of the PIDs, and then through them.
It's not very clear what you're trying to do, but creating and deleting scripts sounds like too much fragile magic.
To answer the question:
Python compiles all of the source and closes the file before executing it, so this is safe.
In general, unlinking an opened file is safe on Linux. (But not everywhere: on Windows you can't delete a file that is in use.)
Note that when you import a module, Python 2 compiles it into a .pyc bytecode file and interprets that. If you remove the .py file, Python will still use the .pyc, and vice versa.
Just don't call reload!
There's no need for Python to hold locks on the files since they are compiled and loaded at import time. Indeed, the ability to swap files out while a program is running is often very useful.
IIRC(!): When on *nix an unlink only removes the name in the filesystem, the inode is removed when the last file handle is closed. Therefore this should not induce any problems, except python tries to reopen the file.
As a higher level question, what's the usual way of creating a list of processes to be killed later easily?
I would put the PIDs in a list and iterate over that with os.kill. I don't see why you're creating and executing a new script for this.
Python reads in a whole source file and compiles it before executing it, so you don't have to worry about deleting or changing your running script file.

Categories