Is there a way to interrupt shutil copytree operation in Python? - python

I'm fairly new to programming in general. I need to develop a program that can copy multiple directories at once and also take into account multiple file type exceptions. I came across the shutil module which offers the copytree and ignore_patterns functions. Here is a snippet of my code which also uses the wxPython Multiple Directory Dialog:
import os
import wx
import wx.lib.agw.multidirdialog as MDD
from shutil import copytree
from shutil import ignore_patterns
app = wx.App(0)
dlg = MDD.MultiDirDialog(None, title="Custom MultiDirDialog", defaultPath=os.getcwd(), agwStyle=MDD.DD_MULTIPLE|MDD.DD_DIR_MUST_EXIST)
dest = "Destination Path"
if dlg.ShowModal() != wx.ID_OK:
dlg.Destroy()
paths = dlg.GetPaths()
ext = ['*.tiff', '*.raw', '*.p4p', '*.hkl', '*.xlsx']
for path in enumerate(paths):
directory = path[1].replace('Local Disk (C:)','C:')
copytree(directory, dest, ignore=ignore_patterns(directory, *ext))
dlg.Destroy()
app.MainLoop()
This code works well for me. At times, I'll be copying terabytes worth of data. Is there anyway that the shutil.copytree can be interrupted? I ask this, because the first time I ran this program, I selected a rather large directory and copied a ton of files (Successfully!) by accident and wanted to stop it :( . Once I get around this, I'll finally start on the GUI! If there is anymore information that I can provide, please let me know! Thanks in advance for any and all help!

You can run the copy in separate python process using multiprocessing module. The code may look something like this:
import time
import shutil
from multiprocessing import Process
def cp(src: str, dest: str):
shutil.copytree(src, dest)
if __name__ == '__main__':
proc = Process(target=cp, args=('Downloads', 'Tmp'), daemon=True)
proc.start()
time.sleep(3)
proc.terminate()
In my example the main process starts a child process, which does the actual coping, and after 3 seconds terminates it. Also you can check if the process is running by calling is_alive() method of the process.

copytree accepts copy_function as a parameter. If you pass a function that checks for a flag you could raise an error to interrupt the operation.
from shutil import copytree, copy2
# set this flag to True to interrupt a copytree operation
interrupt = False
class Interrupt(Exception):
""" interrupts the copy operation """
def interruptable_copy(*args, **kwargs):
if interrupt:
raise Interrupt("Interrupting copy operation")
return copy2(*args, **kwargs)
copytree(src, dst, copy_function=interruptable_copy)

Related

Never-ending file existence check loop

I'm sorry, I'm a complete beginner but very fascinated by scripting automation. I'm trying to check for the existence of a file that arrives once in a while. I want to read it and then delete it. I can't figure out how to keep this action running without the goto Label feature.
Can anyone advise me please?
import os
import os.path
import time
path = os.path.exists('file.txt')
#loop
if path is True:
print("File exists.")
time.sleep(1)
os.remove("file.txt") # Remove the file.
# Now I need to start the loop again.
else:
print("File doesn't exist")
time.sleep(1)
# Now I need to start the loop again.
# And keep it running forever.
This is what "while" loops are for.
import os.path
import time
while True:
time.sleep(1)
if os.path.exists('file.txt'):
print("File exists.")
os.remove("file.txt") #remove the file.
else:
print("File doesn't exist")
You can do this with a batch file. You don't need Python.
I think what you are looking for is a folder monitor which performs actions based on the event handling in the folder. I recommend using the 'watchdog' library in python to monitor the folder for incoming or outgoing files while the 'subprocess' library executes actions like reading and deleting. Refer to the code below
Code:
import subprocess
import time
import os
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
def on_created(event):
print("A File is arrived",os.path.basename(event.src_path))
time.sleep(10)
os.remove(event.src_path)
def on_deleted(event):
print("File Deleted")
if __name__ == "__main__":
event_handler= FileSystemEventHandler()
event_handler.on_created=on_created
event_handler.on_deleted=on_deleted
path="A:/foldername" #"Enter the folder path" you want to monitor here
observer= Observer()
observer.schedule(event_handler,path,recursive=False) #If recursive is set to be True it will check in the subdirectories for contents if a folder is added to our path
observer.start()
try:
while True:
time.sleep(15) # Sleeps for 15 seconds
except KeyboardInterrupt:
observer.stop() # The code terminates if we hit any key in the folder or else it keeps running for ever
observer.join()
Also, if you want to read the file before deleting then add the file reading code in the on_created method before "os.remove" and the code should work fine for you now!

How to listen for a file change in urwid?

I want to remote control a python application which uses urwid for the user interface.
My idea was to create a file, pass it's name as command line argument to the application and whenever I write to the file the application reads from that file.
Urwid's event loop has a method watch_file(fd, callback).
This method is described as "Call callback() when fd has some data to read."
This sounds exactly like what I want to have, but it causes an infinite loop.
callback is executed as often as possible, despite the fact that the file is empty.
Even if I delete the file, callback is still called.
#!/usr/bin/env python3
import urwid
import atexit
def onkeypress(key, size=None):
if key == 'q':
raise urwid.ExitMainLoop()
text.set_text(key)
def onfilechange():
text.set_text(cmdfile.read())
# clear file so that I don't read already executed commands again
# and don't run into an infinite loop - but I am doing that anyway
with open(cmdfile.name, 'w') as f:
pass
cmdfile = open('/tmp/cmd', 'rt')
atexit.register(cmdfile.close)
text = urwid.Text("hello world")
filler = urwid.Filler(text)
loop = urwid.MainLoop(filler, unhandled_input=onkeypress)
loop.watch_file(cmdfile, onfilechange)
if __name__ == '__main__':
loop.run()
(My initial idea was to open the file only for reading instead of keeping it open all the time but fd has to be a file object, not a path.)
Urwid offers several different event loops.
By default, SelectEventLoop is used.
GLibEventLoop has the same behaviour, it runs into an infinite loop.
AsyncioEventLoop instead throws an "operation not permitted" exception.
TwistedEventLoop and TornadoEventLoop would need additional software to be installed.
I have considered using the independent watchdog library but it seems accessing the user interface from another thread would require to write a new loop, see this stack overflow question.
The answer to that question recommends polling instead which I would prefer to avoid.
If urwid specifically provides a method to watch a file I cannot believe that it does not work in any implementation.
So what am I doing wrong?
How do I react to a file change in a python/urwid application?
EDIT:
I have tried using named pipes (and removed the code to clear the file) but visually it has the same behaviour: the app does not start.
Audibly, however, there is a great difference: It does not go into the infinite loop until I write to the file.
Before I write to the file callback is not called but the app is not started either, it just does nothing.
After I write to the file, it behaves as described above for regular files.
I have found the following work around: read a named pipe in another thread, safe each line in a queue and poll in the UI thread to see if something is in the queue.
Create the named pipe with mkfifo /tmp/mypipe.
Then write to it with echo >>/tmp/mypipe "some text".
#!/usr/bin/env python3
import os
import threading
import queue
import urwid
class App:
POLL_TIME_S = .5
def __init__(self):
self.text = urwid.Text("hello world")
self.filler = urwid.Filler(self.text)
self.loop = urwid.MainLoop(self.filler, unhandled_input=self.onkeypress)
def watch_pipe(self, path):
self._cmd_pipe = path
self.queue = queue.Queue()
threading.Thread(target=self._read_pipe_thread, args=(path,)).start()
self.loop.set_alarm_in(0, self._poll_queue)
def _read_pipe_thread(self, path):
while self._cmd_pipe:
with open(path, 'rt') as pipe:
for ln in pipe:
self.queue.put(ln)
self.queue.put("!! EOF !!")
def _poll_queue(self, loop, args):
while not self.queue.empty():
ln = self.queue.get()
self.text.set_text(ln)
self.loop.set_alarm_in(self.POLL_TIME_S, self._poll_queue)
def close(self):
path = self._cmd_pipe
# stop reading
self._cmd_pipe = None
with open(path, 'wt') as pipe:
pipe.write("")
os.remove(path)
def run(self):
self.loop.run()
def onkeypress(self, key, size=None):
if key == 'q':
raise urwid.ExitMainLoop()
self.text.set_text(key)
if __name__ == '__main__':
a = App()
a.watch_pipe('/tmp/mypipe')
a.run()
a.close()

Creating a Flag file

I'm relatively new to python so please forgive early level understanding!
I am working to create a kind of flag file. Its job is to monitor a Python executable, the flag file is constantly running and prints "Start" when the executable started, "Running" while it runs and "Stop" when its stopped or crashed, if a crash occurs i want it to be able to restart the script. so far i have this down for the Restart:
from subprocess import run
from time import sleep
# Path and name to the script you are trying to start
file_path = "py"
restart_timer = 2
def start_script():
try:
# Make sure 'python' command is available
run("python "+file_path, check=True)
except:
# Script crashed, lets restart it!
handle_crash()
def handle_crash():
sleep(restart_timer) # Restarts the script after 2 seconds
start_script()
start_script()
how can i implement this along with a flag file?
Not sure what you mean with "flag", but this minimally achieves what you want.
Main file main.py:
import subprocess
import sys
from time import sleep
restart_timer = 2
file_path = 'sub.py' # file name of the other process
def start():
try:
# sys.executable -> same python executable
subprocess.run([sys.executable, file_path], check=True)
except subprocess.CalledProcessError:
sleep(restart_timer)
return True
else:
return False
def main():
print("starting...")
monitor = True
while monitor:
monitor = start()
if __name__ == '__main__':
main()
Then the process that gets spawned, called sub.py:
from time import sleep
sleep(1)
print("doing stuff...")
# comment out to see change
raise ValueError("sub.py is throwing error...")
Put those files into the same directory and run it with python main.py
You can comment out the throwing of the random error to see the main script terminate normally.
On a larger note, this example is not saying it is a good way to achieve the quality you need...

Python Open File path that changes dynamically

I have a file that is updated throughout the day and the name of that file changes with a time signature of when the file was updated.
e.g.
\folder\todolist\grocerylist_2015-12-30-093000.csv
How can I open this file in python without manually updating the full file path?
You can use the glob module to handle this.
import glob
file_path = glob.glob('\\folder\\todolist\\grocerylist_*.csv')[0]
If there are more files and you only need the most recent, you can sort and take the last one:
file_path = sorted(glob.glob('\\folder\\todolist\\grocerylist_*.csv'))[-1]
Perhaps you could use the python glob module to open the file with the prefix grocerylist_
import glob
path = glob.glob('/folder/todolist/grocerylist_*.csv')[0]
If you want to watch for file creation I would recommend using something like watchdog, this will always give you the latest file created that matches your pattern:
import sys
import time
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
class Handler(PatternMatchingEventHandler):
def __init__(self, regexes):
super(Handler, self).__init__(regexes)
def process(self, event):
print(event.src_path, event.event_type)
def on_created(self, event):
self.process(event)
if __name__ == '__main__':
import os
pth = sys.argv[1:]
observer = Observer()
pth = pth[0] if pth else "./"
observer.schedule(Handler([os.path.join(pth, "grocerylist_*.csv")]), path=pth[0] if pth else './')
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
When a file matching your pattern is created on_created will be called which calls process and you can do whatever you like in the method:
~$ python3 watch.py &
[15] 21005
padraic#home:~$ touch grocerylist_12345.csv
padraic#home:~$ ./grocerylist_12345.csv created
If you want to open the file just do it in process.

os.chdir between multiple python processes

I have a complex python pipeline (which code I cant change), calling multiple other scripts and other executables. The point is it takes ages to run over 8000 directories, doing some scientific analyses. So, I wrote a simple wrapper, (might not be most effective, but seems to work) using the multiprocessing module.
from os import path, listdir, mkdir, system
from os.path import join as osjoin, exists, isfile
from GffTools import Gene, Element, Transcript
from GffTools import read as gread, write as gwrite, sort as gsort
from re import match
from multiprocessing import JoinableQueue, Process
from sys import argv, exit
# some absolute paths
inbase = "/.../abfgp_in"
outbase = "/.../abfgp_out"
abfgp_cmd = "python /.../abfgp-2.rev/abfgp.py"
refGff = "/.../B0510_manual_reindexed_noSeq.gff"
# the Queue
Q = JoinableQueue()
i = 0
# define number of processes
try: num_p = int(argv[1])
except ValueError: exit("Wrong CPU argument")
# This is the function calling the abfgp.py script, which in its turn calls alot of third party software
def abfgp(id_, pid):
out = osjoin(outbase, id_)
if not exists(out): mkdir(out)
# logfile
log = osjoin(outbase, "log_process_%s" %(pid))
try:
# call the script
system("%s --dna %s --multifasta %s --target %s -o %s -q >>%s" %(abfgp_cmd, osjoin(inbase, id_, id_ +".dna.fa"), osjoin(inbase, id_, "informants.mfa"), id_, out, log))
except:
print "ABFGP FAILED"
return
# parse the output
def extractGff(id_):
# code not relevant
# function called by multiple processes, using the Queue
def run(Q, pid):
while not Q.empty():
try:
d = Q.get()
print "%s\t=>>\t%s" %(str(i-Q.qsize()), d)
abfgp(d, pid)
Q.task_done()
except KeyboardInterrupt:
exit("Interrupted Child")
# list of directories
genedirs = [d for d in listdir(inbase)]
genes = gread(refGff)
for d in genedirs:
i += 1
indir = osjoin(inbase, d)
outdir = osjoin(outbase, d)
Q.put(d)
# this loop creates the multiple processes
procs = []
for pid in range(num_p):
try:
p = Process(target=run, args=(Q, pid+1))
p.daemon = True
procs.append(p)
p.start()
except KeyboardInterrupt:
print "Aborting start of child processes"
for x in procs:
x.terminate()
exit("Interrupted")
try:
for p in procs:
p.join()
except:
print "Terminating child processes"
for x in procs:
x.terminate()
exit("Interrupted")
print "Parsing output..."
for d in genedirs: extractGff(d)
Now the problem is, abfgp.py uses the os.chdir function, which seems to disrupt the parallel processing. I get a lot of errors, stating that some (input/output) files/directories cannot be found for reading/writing. Even though I call the script through os.system(), from which I though spawning separate processes would prevent this.
How can I work around these chdir interference?
Edit: I might change os.system() to subprocess.Popen(cwd="...") with the right directory. I hope this makes a difference.
Thanks.
Edit 2
Do not use os.system() use subprocess.call()
system("%s --dna %s --multifasta %s --target %s -o %s -q >>%s" %(abfgp_cmd, osjoin(inbase, id_, id_ +".dna.fa"), osjoin(inbase, id_, "informants.mfa"), id_, out, log))
would translate to
subprocess.call((abfgp_cmd, '--dna', osjoin(inbase, id_, id_ +".dna.fa"), '--multifasta', osjoin(inbase, id_, "informants.mfa"), '--target', id_, '-o', out, '-q')) # without log.
Edit 1
I think the problem is that multiprocessing is using the module names to serialize functions, classes.
This means if you do import module where module is in ./module.py and the you do something like os.chdir('./dir') now you would need to from .. import module.
The child processes inherit the folder of the parent process. This may be a problem.
Solutions
Make sure that all modules are imported (in the child processes) and after this you change the directory
insert the original os.getcwd() to sys.path to enable import from the original directory. This must be done before any functions are called from the local directory.
put all functions that you use inside a directory that can always be imported. The site-packages could be such a directory. Then you can do something like import module module.main() to start what you do.
This is a hack that I do because I know how pickle works. Only use this if other attempts fail.
The script prints:
serialized # the function runD is serialized
string executed # before the function is loaded the code is executed
loaded # now the function run is deserialized
run # run is called
In you case you would do something like this:
runD = evalBeforeDeserialize('__import__("sys").path.append({})'.format(repr(os.getcwd())), run)
p = Process(target=runD, args=(Q, pid+1))
This is the script:
# functions that you need
class R(object):
def __init__(self, call, *args):
self.ret = (call, args)
def __reduce__(self):
return self.ret
def __call__(self, *args, **kw):
raise NotImplementedError('this should never be called')
class evalBeforeDeserialize(object):
def __init__(self, string, function):
self.function = function
self.string = string
def __reduce__(self):
return R(getattr, tuple, '__getitem__'), \
((R(eval, self.string), self.function), -1)
# code to show how it works
def printing():
print('string executed')
def run():
print('run')
runD = evalBeforeDeserialize('__import__("__main__").printing()', run)
import pickle
s = pickle.dumps(runD)
print('serialized')
run2 = pickle.loads(s)
print('loaded')
run2()
Please report back if these do not work.
You could determine which instance of the os library the unalterable program is using; then create a tailored version of chdir in that library that does what you need -- prevent the directory change, log it, whatever. If the tailored behavior needs to be just for the single program, you can use the inspect module to identify the caller and tailor the behavior in a specific way for just that caller.
Your options are limited if you truly can't alter the existing program; but if you have the option of altering libraries it imports, something like this could be a least-invasive way to skirt the undesired behavior.
Usual caveats apply when altering a standard library.

Categories