Related
I'm trying to customise the SFTOperator take download multiple file from a server. I know that the original SFTPOperator only allow one file at a time.
I copied the same code from source and I twerk by adding a new function called get_xml_from_source(). Please refer the code below:
def get_xml_from_source(sftp_client, remote_filepath, local_filepath, prev_execution_date, execution_date):
"""
Copy from Source to local path
"""
files_attr = sftp_client.listdir_attr(remote_filepath) # eg: /source/ HITTING ERROR HERE
files_name = sftp_client.listdir(remote_filepath) # eg: /source/
today_midnight = datetime.combine(datetime.today(), time.min)
yesterday_midnight = today_midnight - timedelta(days=1)
for file_attr, file_name in zip(files_attr, files_name):
modified_time = datetime.fromtimestamp(file_attr.st_mtime)
if yesterday_midnight <= modified_time < today_midnight:
# if prev_execution_date <= modified_time < execution_date:
try:
# Download to local path
sftp_client.get(remote_filepath, local_filepath)
print(file_name)
except: # pylint: disable=bare-except
print("File not found")
else:
print("Not the file!")
Where this function will only download files from yesterday up to today.
I added the function at this line:
with self.ssh_hook.get_conn() as ssh_client:
sftp_client = ssh_client.open_sftp()
if self.operation.lower() == SFTPOperation.GET:
local_folder = os.path.dirname(self.local_filepath)
if self.create_intermediate_dirs:
# Create Intermediate Directories if it doesn't exist
try:
os.makedirs(local_folder)
except OSError:
if not os.path.isdir(local_folder):
raise
file_msg = "from {0} to {1}".format(self.remote_filepath,
self.local_filepath)
self.log.info("Starting to transfer %s", file_msg)
# This is where it starts to copy, customization begins here
# sftp_client.get(self.remote_filepath, self.local_filepath) <--- Original code that I commented out and replace with mine below
get_xml_from_source(sftp_client, self.remote_filepath,
self.local_filepath, self.prev_execution_date, self.execution_date)
Note that, rest of the codes did not change. It is how it looks like in the source.
I keep hitting error at files_attr = sftp_client.listdir_attr(remote_filepath) with this error:
Error while transferring from /source/ to
/path/to/destination, error: [Errno 2] No such file.
Which obviously meant, it can't find the sftp directory. I tried running the whole function locally, it works fine.
Is there any part of the code that tied the paramiko connection to only get one file? I checked the paramiko connection for SFTPOperator, it should be just fine. In this case, how should I fix it?
This is how I established my connection when running locally :
def connect_to_source():
"""
Get source credentials
:param: None
:return: username & password
"""
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
username, password = get_eet_credentials()
# key = paramiko.RSAKey.from_private_key_file(openssh_key, password=password)
ssh.connect(hostname=SFTP_SERVER, port=SFTP_PORT_NUMBER,
username=username, password=password)
client = ssh.open_sftp()
print("Connection to source success!")
return client
Lastly, below is my airflow task:
def copy_from_source():
"""
Copy XML file from source to local path
"""
return SFTPOperator(
task_id="copy_from_source",
ssh_conn_id="source_conn",
local_filepath=f"{current_dir}/destination",
remote_filepath= "/source/",
prev_execution_date='{{ prev_execution_date }}',
execution_date='{{ execution_date }}', # strftime("%Y-%m-%d %H:%M:%S")
create_intermediate_dirs=True,
operation="get",
dag=dag
)
I'm trying to do something similar to you. I'm not sure what is causing the issues you are facing but this is the updated SFTP Operator I have written that gets multiple files from a server
sftp_get_multiple_files_operator.py
import os
from pathlib import Path
from typing import Any
from airflow.exceptions import AirflowException
from airflow.models import BaseOperator
from airflow.contrib.hooks import SSHHook
class SFTPGetMultipleFilesOperator(BaseOperator):
template_fields = ('local_directory', 'remote_filename_pattern', 'remote_host')
def __init__(
self,
*,
ssh_hook=None,
ssh_conn_id=None,
remote_host=None,
local_directory=None,
remote_filename_pattern=None,
filetype=None,
confirm=True,
create_intermediate_dirs=False,
**kwargs,
) -> None:
super().__init__(**kwargs)
self.ssh_hook = ssh_hook
self.ssh_conn_id = ssh_conn_id
self.remote_host = remote_host
self.local_directory = local_directory
self.filetype = filetype
self.remote_filename_pattern = remote_filename_pattern
self.confirm = confirm
self.create_intermediate_dirs = create_intermediate_dirs
def execute(self, context: Any) -> str:
file_msg = None
try:
if self.ssh_conn_id:
if self.ssh_hook and isinstance(self.ssh_hook, SSHHook):
self.log.info("ssh_conn_id is ignored when ssh_hook is provided.")
else:
self.log.info(
"ssh_hook is not provided or invalid. Trying ssh_conn_id to create SSHHook."
)
self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
if not self.ssh_hook:
raise AirflowException("Cannot operate without ssh_hook or ssh_conn_id.")
if self.remote_host is not None:
self.log.info(
"remote_host is provided explicitly. "
"It will replace the remote_host which was defined "
"in ssh_hook or predefined in connection of ssh_conn_id."
)
self.ssh_hook.remote_host = self.remote_host
with self.ssh_hook.get_conn() as ssh_client:
sftp_client = ssh_client.open_sftp()
all_files = sftp_client.listdir()
self.log.info(f'Found {len(all_files)} files on server')
timestamp = context['ds_nodash']
filename_pattern = self.remote_filename_pattern + timestamp
# fetch all CSV files for the run date that match the filename pattern
matching_files = [f for f in all_files
if f.find(filename_pattern) != -1]
# if file type is specified filter matching files for the file type
if self.filetype is not None:
matching_files = [filename for filename in matching_files
if filename[-len(self.filetype):] == self.filetype]
self.log.info(f'Found {len(matching_files)} files with name including {filename_pattern}')
local_folder = os.path.dirname(self.local_directory)
if self.create_intermediate_dirs:
Path(local_folder).mkdir(parents=True, exist_ok=True)
for f in matching_files:
self.log.info(f"Starting to transfer from /{f} to {self.local_directory}/{f}")
sftp_client.get(f'/{f}', f'{self.local_directory}/{f}')
except Exception as e:
raise AirflowException(f"Error while transferring {file_msg}, error: {str(e)}")
return self.local_directory
def _make_intermediate_dirs(sftp_client, remote_directory) -> None:
"""
Create all the intermediate directories in a remote host
:param sftp_client: A Paramiko SFTP client.
:param remote_directory: Absolute Path of the directory containing the file
:return:
"""
if remote_directory == '/':
sftp_client.chdir('/')
return
if remote_directory == '':
return
try:
sftp_client.chdir(remote_directory)
except OSError:
dirname, basename = os.path.split(remote_directory.rstrip('/'))
_make_intermediate_dirs(sftp_client, dirname)
sftp_client.mkdir(basename)
sftp_client.chdir(basename)
return
dag.py
sftp_report = SFTPGetMultipleFilesOperator(
task_id=f"sftp_reports_to_gcs",
ssh_conn_id="sftp_connection",
local_directory=f'/opt/airflow/dags/reports',
remote_filename_pattern=f'reportname_', # ds_nodash is added in the operator by accessing Airflow context
create_intermediate_dirs=True,
filetype='.csv'
)
I'm trying to get the list of files that are fully uploaded on the FTP server.
I have access to this FTP server where a 3rd party writes data and marker files every 15 minutes. Once the data file is completely uploaded then a marker file gets created. we know once this marker file is there that means data files are ready and we can download it. I'm looking for a way to efficiently approach this problem. I want to check every minute if there are any new stable files on FTP server, if there is then I'll download those files. one preferred way is see if the marker file is 2 minutes old then we are good to download marker file and corresponding data file.
I'm new with python and looking for help.
I have some code till I list out the files
import paramiko
from datetime import datetime, timedelta
FTP_HOST = 'host_address'
FTP_PORT = 21
FTP_USERNAME = 'username'
FTP_PASSWORD = 'password'
FTP_ROOT_PATH = 'path_to_dir'
def today():
return datetime.strftime(datetime.now(), '%Y%m%d')
def open_ftp_connection(ftp_host, ftp_port, ftp_username, ftp_password):
"""
Opens ftp connection and returns connection object
"""
client = paramiko.SSHClient()
client.load_system_host_keys()
try:
transport = paramiko.Transport(ftp_host, ftp_port)
except Exception as e:
return 'conn_error'
try:
transport.connect(username=ftp_username, password=ftp_password)
except Exception as identifier:
return 'auth_error'
ftp_connection = paramiko.SFTPClient.from_transport(transport)
return ftp_connection
def show_ftp_files_stat():
ftp_connection = open_ftp_connection(FTP_HOST, int(FTP_PORT), FTP_USERNAME, FTP_PASSWORD)
full_ftp_path = FTP_ROOT_PATH + "/" + today()
file_attr_list = ftp_connection.listdir_attr(full_ftp_path)
print(file_attr_list)
for file_attr in file_attr_list:
print(file_attr.filename, file_attr.st_size, file_attr.st_mtime)
if __name__ == '__main__':
show_ftp_files_stat()
Sample file name
org-reference-delta-quotes.REF.48C2.20200402.92.1.1.txt.gz
Sample corresponding marker file name
org-reference-delta-quotes.REF.48C2.20200402.92.note.txt.gz
I solved my use case with 2 min stable rule, if modified time is within 2 min of the current time, I consider them stable.
import logging
import time
from datetime import datetime, timezone
from ftplib import FTP
FTP_HOST = 'host_address'
FTP_PORT = 21
FTP_USERNAME = 'username'
FTP_PASSWORD = 'password'
FTP_ROOT_PATH = 'path_to_dir'
logger = logging.getLogger()
logger.setLevel(logging.ERROR)
def today():
return datetime.strftime(datetime.now(tz=timezone.utc), '%Y%m%d')
def current_utc_ts():
return datetime.utcnow().timestamp()
def current_utc_ts_minus_120():
return int(datetime.utcnow().timestamp()) - 120
def yyyymmddhhmmss_string_epoch_ts(dt_string):
return time.mktime(time.strptime(dt_string, '%Y%m%d%H%M%S'))
def get_ftp_connection(ftp_host, ftp_username, ftp_password):
try:
ftp = FTP(ftp_host, ftp_username, ftp_password)
except Exception as e:
print(e)
logger.error(e)
return 'conn_error'
return ftp
def get_list_of_files(ftp_connection, date_to_process):
full_ftp_path = FTP_ROOT_PATH + "/" + date_to_process + "/"
ftp_connection.cwd(full_ftp_path)
entries = list(ftp_connection.mlsd())
entry_list = [line for line in entries if line[0].endswith('.gz') | line[0].endswith('.zip')]
ftp_connection.quit()
print('Total file count', len(entry_list))
return entry_list
def parse_file_list_to_dict(entries):
try:
file_dict_list = []
for line in entries:
file_dict = dict({"file_name": line[0],
"server_timestamp": int(yyyymmddhhmmss_string_epoch_ts(line[1]['modify'])),
"server_date": line[0].split(".")[3])
file_dict_list.append(file_dict)
except IndexError as e:
# Output expected IndexErrors.
logging.exception(e)
except Exception as exception:
# Output unexpected Exceptions.
logging.exception(exception, False)
return file_dict_list
def get_stable_files_dict_list(dict_list):
stable_list = list(filter(lambda d: d['server_timestamp'] < current_utc_ts_minus_120(), dict_list))
print('stable file count: {}'.format(len(stable_list)))
return stable_list
if __name__ == '__main__':
ftp_connection = get_ftp_connection(FTP_HOST, FTP_USERNAME, FTP_PASSWORD)
if ftp_connection == 'conn_error':
logger.error('Failed to connect FTP Server!')
else:
file_list = get_list_of_files(ftp_connection, today())
parse_file_list = parse_file_list_to_dict(file_list)
stable_file_list = get_stable_files_dict_list(parse_file_list)
What's the fastest way to serve static files in Python? I'm looking for something equal or close enough to Nginx's static file serving.
I know of SimpleHTTPServer but not sure if it can handle serving multiple files efficiently and reliably.
Also, I don't mind it being a part of a lib/framework of some sort as long as its lib/framework is lightweight.
EDIT: This project appears to be dead.
What about FAPWS3? One of the selling points:
Static file server
FAPWS can be used to serve a huge amount of static file requests. With the help of a async database in the backend, you can use FAPWS as your own Amazon S3.
If you look for a oneliner you can do the following:
$> python -m SimpleHTTPServer
This will not fullfil all the task required but worth mentioning that this is the simplest way :-)
I would highly recommend using a 3rd party HTTP server to serve static files.
Servers like nginx are heavily optimized for the task at hand, parallelized and written in fast languages.
Python is tied to one processor and interpreted.
Original SimpleHTTPServer from python standard library does NOT "handle serving multiple files efficiently and reliably". For instance, if you are downloading one file from it, another HTTP access to it must be hovering since SimpleHTTPServer.py is a simple singal-thread HTTP server which could only support one connecting simultaneously.
Fortunately, note that SimpleHTTPServer.py use BaseHTTPServer.HTTPServer as handler, which can be wrapped by SocketServer.ForkingMixIn and SocketServer.ThreadingMixIn also from python standard library to support multi-process and multi-thread mode, which could highly enhance simple HTTP server's "efficience and reliability".
According to this idea, a SimpleHTTPServer with multi-thread/multi-process support modified from original one is given as follows:
$ python2.7 ModifiedSimpleHTTPServer.py
usage: ModifiedSimpleHTTPServer.py [-h] [--pydoc] [--port PORT]
[--type {process,thread}] [--root ROOT]
[--run]
Modified SimpleHTTPServer with MultiThread/MultiProcess and IP bind support.
Original: https://docs.python.org/2.7/library/simplehttpserver.html
Modified by: vbem#163.com
optional arguments:
-h, --help show this help message and exit
--pydoc show this module's pydoc
run arguments:
--port PORT specify server port (default: 8000)
--type {process,thread}
specify server type (default: 'thread')
--root ROOT specify root directory (default: cwd '/home/vbem')
--run run http server foreground
NOTE: stdin for input, stdout for result, stderr for logging
For example, ModifiedSimpleHTTPServer.py --run --root /var/log --type process will run a multi-process HTTP static files server with '/var/log' as its root directory.
Modified codes are:
#! /usr/bin/env python2.7
# -*- coding: utf-8 -*-
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
r"""Modified SimpleHTTPServer with MultiThread/MultiProcess and IP bind support.
Original: https://docs.python.org/2.7/library/simplehttpserver.html
Modified by: vbem#163.com
"""
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
import os, sys, pwd, posixpath, BaseHTTPServer, urllib, cgi, shutil, mimetypes, socket, SocketServer, BaseHTTPServer
from cStringIO import StringIO
USERNAME = pwd.getpwuid(os.getuid()).pw_name
HOSTNAME = socket.gethostname()
PORT_DFT = 8000
class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
server_version = "SimpleHTTP/0.6"
def do_GET(self):
f = self.send_head()
if f:
self.copyfile(f, self.wfile)
f.close()
def do_HEAD(self):
f = self.send_head()
if f:
f.close()
def send_head(self):
path = self.translate_path(self.path)
f = None
if os.path.isdir(path):
if not self.path.endswith('/'):
self.send_response(301)
self.send_header("Location", self.path + "/")
self.end_headers()
return None
for index in "index.html", "index.htm":
index = os.path.join(path, index)
if os.path.exists(index):
path = index
break
else:
return self.list_directory(path)
ctype = self.guess_type(path)
try:
f = open(path, 'rb')
except IOError:
self.send_error(404, "File not found")
return None
self.send_response(200)
self.send_header("Content-type", ctype)
fs = os.fstat(f.fileno())
self.send_header("Content-Length", str(fs[6]))
self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
self.end_headers()
return f
def list_directory(self, path):
try:
list = ['..'] + os.listdir(path) #
except os.error:
self.send_error(404, "No permission to list directory")
return None
list.sort(key=lambda a: a.lower())
f = StringIO()
displaypath = cgi.escape(urllib.unquote(self.path))
f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">')
f.write("<html>\n<title>%s %s</title>\n<body>" % (HOSTNAME, displaypath))
f.write("%s#%s:<strong>%s</strong>\n" % (USERNAME, HOSTNAME, path.rstrip('/')+'/'))
f.write("<hr>\n<ul>\n")
for name in list:
fullname = os.path.join(path, name)
displayname = linkname = name
if os.path.isdir(fullname):
displayname = name + "/"
linkname = name + "/"
if os.path.islink(fullname):
displayname = name + "#"
f.write('<li>%s\n'
% (urllib.quote(linkname), cgi.escape(displayname)))
f.write("</ul>\n<hr>\n<pre>%s</pre>\n</body>\n</html>\n" % __doc__)
length = f.tell()
f.seek(0)
self.send_response(200)
encoding = sys.getfilesystemencoding()
self.send_header("Content-type", "text/html; charset=%s" % encoding)
self.send_header("Content-Length", str(length))
self.end_headers()
return f
def translate_path(self, path):
path = path.split('?',1)[0]
path = path.split('#',1)[0]
path = posixpath.normpath(urllib.unquote(path))
words = path.split('/')
words = filter(None, words)
path = os.getcwd()
for word in words:
drive, word = os.path.splitdrive(word)
head, word = os.path.split(word)
if word in (os.curdir, os.pardir): continue
path = os.path.join(path, word)
return path
def copyfile(self, source, outputfile):
shutil.copyfileobj(source, outputfile)
def guess_type(self, path):
base, ext = posixpath.splitext(path)
if ext in self.extensions_map:
return self.extensions_map[ext]
ext = ext.lower()
if ext in self.extensions_map:
return self.extensions_map[ext]
else:
return self.extensions_map['']
if not mimetypes.inited:
mimetypes.init()
extensions_map = mimetypes.types_map.copy()
extensions_map.update({'': 'text/plain'})
class ProcessedHTTPServer(SocketServer.ForkingMixIn, BaseHTTPServer.HTTPServer):
r"""Handle requests in multi process."""
class ThreadedHTTPServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer):
r"""Handle requests in a separate thread."""
SERVER_DICT = {
'thread' : ThreadedHTTPServer,
'process' : ProcessedHTTPServer,
}
SERVER_DFT = 'thread'
def run(sCwd=None, sServer=SERVER_DFT, nPort=PORT_DFT, *lArgs, **dArgs):
r"""
"""
sys.stderr.write('start with %r\n' % sys._getframe().f_locals)
if sCwd is not None:
os.chdir(sCwd)
cServer = SERVER_DICT[sServer]
oHttpd = cServer(("", nPort), SimpleHTTPRequestHandler)
sys.stderr.write('http://%s:%s/\n' % (HOSTNAME, nPort))
oHttpd.serve_forever()
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# main
def _main():
r"""Main.
"""
import argparse
oParser = argparse.ArgumentParser(
description = __doc__,
formatter_class = argparse.RawTextHelpFormatter,
epilog = 'NOTE: stdin for input, stdout for result, stderr for logging',
)
oParser.add_argument('--pydoc', action='store_true',
help = "show this module's pydoc",
)
oGroupR = oParser.add_argument_group(title='run arguments', description='')
oGroupR.add_argument('--port', action='store', type=int, default=PORT_DFT,
help = 'specify server port (default: %(default)r)',
)
oGroupR.add_argument('--type', action='store', default=SERVER_DFT, choices=SERVER_DICT.keys(),
help = 'specify server type (default: %(default)r)',
)
oGroupR.add_argument('--root', action='store', default=os.getcwd(),
help = 'specify root directory (default: cwd %(default)r)',
)
oGroupR.add_argument('--run', action='store_true',
help = '\n'.join((
'run http server foreground',
)))
oArgs = oParser.parse_args()
if oArgs.pydoc:
help(os.path.splitext(os.path.basename(__file__))[0])
elif oArgs.run:
return run(sCwd=oArgs.root, sServer=oArgs.type, nPort=oArgs.port)
else:
oParser.print_help()
return 1
return 0
if __name__ == "__main__":
exit(_main())
Meanwhile, the single python file with only 200 lines may satisfy your "in Python" and "lightweight" demands.
Last but not least, this ModifiedSimpleHTTPServer.py may be a "killer app" by hand for temporary use, however, Nginx is advised for long term use.
I'm attempting to debug a Subversion post-commit hook that calls some python scripts. What I've been able to determine so far is that when I run post-commit.bat manually (I've created a wrapper for it to make it easier) everything succeeds, but when SVN runs it one particular step doesn't work.
We're using CollabNet SVNServe, which I know from the documentation removes all environment variables. This had caused some problems earlier, but shouldn't be an issue now.
Before Subversion calls a hook script, it removes all variables - including $PATH on Unix, and %PATH% on Windows - from the environment. Therefore, your script can only run another program if you spell out that program's absolute name.
The relevant portion of post-commit.bat is:
echo -------------------------- >> c:\svn-repos\company\hooks\svn2ftp.out.log
set SITENAME=staging
set SVNPATH=branches/staging/wwwroot/
"C:\Python3\python.exe" C:\svn-repos\company\hooks\svn2ftp.py ^
--svnUser="svnusername" ^
--svnPass="svnpassword" ^
--ftp-user=ftpuser ^
--ftp-password=ftppassword ^
--ftp-remote-dir=/ ^
--access-url=svn://10.0.100.6/company ^
--status-file="C:\svn-repos\company\hooks\svn2ftp-%SITENAME%.dat" ^
--project-directory=%SVNPATH% "staging.company.com" %1 %2 >> c:\svn-repos\company\hooks\svn2ftp.out.log
echo -------------------------- >> c:\svn-repos\company\hooks\svn2ftp.out.log
When I run post-commit.bat manually, for example: post-commit c:\svn-repos\company 12345, I see output like the following in svn2ftp.out.log:
--------------------------
args1: c:\svn-repos\company
args0: staging.company.com
abspath: c:\svn-repos\company
project_dir: branches/staging/wwwroot/
local_repos_path: c:\svn-repos\company
getting youngest revision...
done, up-to-date
--------------------------
However, when I commit something to the repo and it runs automatically, the output is:
--------------------------
--------------------------
svn2ftp.py is a bit long, so I apologize but here goes. I'll have some notes/disclaimers about its contents below it.
#!/usr/bin/env python
"""Usage: svn2ftp.py [OPTION...] FTP-HOST REPOS-PATH
Upload to FTP-HOST changes committed to the Subversion repository at
REPOS-PATH. Uses svn diff --summarize to only propagate the changed files
Options:
-?, --help Show this help message.
-u, --ftp-user=USER The username for the FTP server. Default: 'anonymous'
-p, --ftp-password=P The password for the FTP server. Default: '#'
-P, --ftp-port=X Port number for the FTP server. Default: 21
-r, --ftp-remote-dir=DIR The remote directory that is expected to resemble the
repository project directory
-a, --access-url=URL This is the URL that should be used when trying to SVN
export files so that they can be uploaded to the FTP
server
-s, --status-file=PATH Required. This script needs to store the last
successful revision that was transferred to the
server. PATH is the location of this file.
-d, --project-directory=DIR If the project you are interested in sending to
the FTP server is not under the root of the
repository (/), set this parameter.
Example: -d 'project1/trunk/'
This should NOT start with a '/'.
2008.5.2 CKS
Fixed possible Windows-related bug with tempfile, where the script didn't have
permission to write to the tempfile. Replaced this with a open()-created file
created in the CWD.
2008.5.13 CKS
Added error logging. Added exception for file-not-found errors when deleting files.
2008.5.14 CKS
Change file open to 'rb' mode, to prevent Python's universal newline support from
stripping CR characters, causing later comparisons between FTP and SVN to report changes.
"""
try:
import sys, os
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s %(levelname)s %(message)s',
filename='svn2ftp.debug.log',
filemode='a'
)
console = logging.StreamHandler()
console.setLevel(logging.ERROR)
logging.getLogger('').addHandler(console)
import getopt, tempfile, smtplib, traceback, subprocess
from io import StringIO
import pysvn
import ftplib
import inspect
except Exception as e:
logging.error(e)
#capture the location of the error
frame = inspect.currentframe()
stack_trace = traceback.format_stack(frame)
logging.debug(stack_trace)
print(stack_trace)
#end capture
sys.exit(1)
#defaults
host = ""
user = "anonymous"
password = "#"
port = 21
repo_path = ""
local_repos_path = ""
status_file = ""
project_directory = ""
remote_base_directory = ""
toAddrs = "developers#company.com"
youngest_revision = ""
def email(toAddrs, message, subject, fromAddr='autonote#company.com'):
headers = "From: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n" % (fromAddr, toAddrs, subject)
message = headers + message
logging.info('sending email to %s...' % toAddrs)
server = smtplib.SMTP('smtp.company.com')
server.set_debuglevel(1)
server.sendmail(fromAddr, toAddrs, message)
server.quit()
logging.info('email sent')
def captureErrorMessage(e):
sout = StringIO()
traceback.print_exc(file=sout)
errorMessage = '\n'+('*'*80)+('\n%s'%e)+('\n%s\n'%sout.getvalue())+('*'*80)
return errorMessage
def usage_and_exit(errmsg):
"""Print a usage message, plus an ERRMSG (if provided), then exit.
If ERRMSG is provided, the usage message is printed to stderr and
the script exits with a non-zero error code. Otherwise, the usage
message goes to stdout, and the script exits with a zero
errorcode."""
if errmsg is None:
stream = sys.stdout
else:
stream = sys.stderr
print(__doc__, file=stream)
if errmsg:
print("\nError: %s" % (errmsg), file=stream)
sys.exit(2)
sys.exit(0)
def read_args():
global host
global user
global password
global port
global repo_path
global local_repos_path
global status_file
global project_directory
global remote_base_directory
global youngest_revision
try:
opts, args = getopt.gnu_getopt(sys.argv[1:], "?u:p:P:r:a:s:d:SU:SP:",
["help",
"ftp-user=",
"ftp-password=",
"ftp-port=",
"ftp-remote-dir=",
"access-url=",
"status-file=",
"project-directory=",
"svnUser=",
"svnPass="
])
except getopt.GetoptError as msg:
usage_and_exit(msg)
for opt, arg in opts:
if opt in ("-?", "--help"):
usage_and_exit()
elif opt in ("-u", "--ftp-user"):
user = arg
elif opt in ("-p", "--ftp-password"):
password = arg
elif opt in ("-SU", "--svnUser"):
svnUser = arg
elif opt in ("-SP", "--svnPass"):
svnPass = arg
elif opt in ("-P", "--ftp-port"):
try:
port = int(arg)
except ValueError as msg:
usage_and_exit("Invalid value '%s' for --ftp-port." % (arg))
if port < 1 or port > 65535:
usage_and_exit("Value for --ftp-port must be a positive integer less than 65536.")
elif opt in ("-r", "--ftp-remote-dir"):
remote_base_directory = arg
elif opt in ("-a", "--access-url"):
repo_path = arg
elif opt in ("-s", "--status-file"):
status_file = os.path.abspath(arg)
elif opt in ("-d", "--project-directory"):
project_directory = arg
if len(args) != 3:
print(str(args))
usage_and_exit("host and/or local_repos_path not specified (" + len(args) + ")")
host = args[0]
print("args1: " + args[1])
print("args0: " + args[0])
print("abspath: " + os.path.abspath(args[1]))
local_repos_path = os.path.abspath(args[1])
print('project_dir:',project_directory)
youngest_revision = int(args[2])
if status_file == "" : usage_and_exit("No status file specified")
def main():
global host
global user
global password
global port
global repo_path
global local_repos_path
global status_file
global project_directory
global remote_base_directory
global youngest_revision
read_args()
#repository,fs_ptr
#get youngest revision
print("local_repos_path: " + local_repos_path)
print('getting youngest revision...')
#youngest_revision = fs.youngest_rev(fs_ptr)
assert youngest_revision, "Unable to lookup youngest revision."
last_sent_revision = get_last_revision()
if youngest_revision == last_sent_revision:
# no need to continue. we should be up to date.
print('done, up-to-date')
return
if last_sent_revision or youngest_revision < 10:
# Only compare revisions if the DAT file contains a valid
# revision number. Otherwise we risk waiting forever while
# we parse and uploading every revision in the repo in the case
# where a repository is retroactively configured to sync with ftp.
pysvn_client = pysvn.Client()
pysvn_client.callback_get_login = get_login
rev1 = pysvn.Revision(pysvn.opt_revision_kind.number, last_sent_revision)
rev2 = pysvn.Revision(pysvn.opt_revision_kind.number, youngest_revision)
summary = pysvn_client.diff_summarize(repo_path, rev1, repo_path, rev2, True, False)
print('summary len:',len(summary))
if len(summary) > 0 :
print('connecting to %s...' % host)
ftp = FTPClient(host, user, password)
print('connected to %s' % host)
ftp.base_path = remote_base_directory
print('set remote base directory to %s' % remote_base_directory)
#iterate through all the differences between revisions
for change in summary :
#determine whether the path of the change is relevant to the path that is being sent, and modify the path as appropriate.
print('change path:',change.path)
ftp_relative_path = apply_basedir(change.path)
print('ftp rel path:',ftp_relative_path)
#only try to sync path if the path is in our project_directory
if ftp_relative_path != "" :
is_file = (change.node_kind == pysvn.node_kind.file)
if str(change.summarize_kind) == "delete" :
print("deleting: " + ftp_relative_path)
try:
ftp.delete_path("/" + ftp_relative_path, is_file)
except ftplib.error_perm as e:
if 'cannot find the' in str(e) or 'not found' in str(e):
# Log, but otherwise ignore path-not-found errors
# when deleting, since it's not a disaster if the file
# we want to delete is already gone.
logging.error(captureErrorMessage(e))
else:
raise
elif str(change.summarize_kind) == "added" or str(change.summarize_kind) == "modified" :
local_file = ""
if is_file :
local_file = svn_export_temp(pysvn_client, repo_path, rev2, change.path)
print("uploading file: " + ftp_relative_path)
ftp.upload_path("/" + ftp_relative_path, is_file, local_file)
if is_file :
os.remove(local_file)
elif str(change.summarize_kind) == "normal" :
print("skipping 'normal' element: " + ftp_relative_path)
else :
raise str("Unknown change summarize kind: " + str(change.summarize_kind) + ", path: " + ftp_relative_path)
ftp.close()
#write back the last revision that was synced
print("writing last revision: " + str(youngest_revision))
set_last_revision(youngest_revision) # todo: undo
def get_login(a,b,c,d):
#arguments don't matter, we're always going to return the same thing
try:
return True, "svnUsername", "svnPassword", True
except Exception as e:
logging.error(e)
#capture the location of the error
frame = inspect.currentframe()
stack_trace = traceback.format_stack(frame)
logging.debug(stack_trace)
#end capture
sys.exit(1)
#functions for persisting the last successfully synced revision
def get_last_revision():
if os.path.isfile(status_file) :
f=open(status_file, 'r')
line = f.readline()
f.close()
try: i = int(line)
except ValueError:
i = 0
else:
i = 0
f = open(status_file, 'w')
f.write(str(i))
f.close()
return i
def set_last_revision(rev) :
f = open(status_file, 'w')
f.write(str(rev))
f.close()
#augmented ftp client class that can work off a base directory
class FTPClient(ftplib.FTP) :
def __init__(self, host, username, password) :
self.base_path = ""
self.current_path = ""
ftplib.FTP.__init__(self, host, username, password)
def cwd(self, path) :
debug_path = path
if self.current_path == "" :
self.current_path = self.pwd()
print("pwd: " + self.current_path)
if not os.path.isabs(path) :
debug_path = self.base_path + "<" + path
path = os.path.join(self.current_path, path)
elif self.base_path != "" :
debug_path = self.base_path + ">" + path.lstrip("/")
path = os.path.join(self.base_path, path.lstrip("/"))
path = os.path.normpath(path)
#by this point the path should be absolute.
if path != self.current_path :
print("change from " + self.current_path + " to " + debug_path)
ftplib.FTP.cwd(self, path)
self.current_path = path
else :
print("staying put : " + self.current_path)
def cd_or_create(self, path) :
assert os.path.isabs(path), "absolute path expected (" + path + ")"
try: self.cwd(path)
except ftplib.error_perm as e:
for folder in path.split('/'):
if folder == "" :
self.cwd("/")
continue
try: self.cwd(folder)
except:
print("mkd: (" + path + "):" + folder)
self.mkd(folder)
self.cwd(folder)
def upload_path(self, path, is_file, local_path) :
if is_file:
(path, filename) = os.path.split(path)
self.cd_or_create(path)
# Use read-binary to avoid universal newline support from stripping CR characters.
f = open(local_path, 'rb')
self.storbinary("STOR " + filename, f)
f.close()
else:
self.cd_or_create(path)
def delete_path(self, path, is_file) :
(path, filename) = os.path.split(path)
print("trying to delete: " + path + ", " + filename)
self.cwd(path)
try:
if is_file :
self.delete(filename)
else:
self.delete_path_recursive(filename)
except ftplib.error_perm as e:
if 'The system cannot find the' in str(e) or '550 File not found' in str(e):
# Log, but otherwise ignore path-not-found errors
# when deleting, since it's not a disaster if the file
# we want to delete is already gone.
logging.error(captureErrorMessage(e))
else:
raise
def delete_path_recursive(self, path):
if path == "/" :
raise "WARNING: trying to delete '/'!"
for node in self.nlst(path) :
if node == path :
#it's a file. delete and return
self.delete(path)
return
if node != "." and node != ".." :
self.delete_path_recursive(os.path.join(path, node))
try: self.rmd(path)
except ftplib.error_perm as msg :
sys.stderr.write("Error deleting directory " + os.path.join(self.current_path, path) + " : " + str(msg))
# apply the project_directory setting
def apply_basedir(path) :
#remove any leading stuff (in this case, "trunk/") and decide whether file should be propagated
if not path.startswith(project_directory) :
return ""
return path.replace(project_directory, "", 1)
def svn_export_temp(pysvn_client, base_path, rev, path) :
# Causes access denied error. Couldn't deduce Windows-perm issue.
# It's possible Python isn't garbage-collecting the open file-handle in time for pysvn to re-open it.
# Regardless, just generating a simple filename seems to work.
#(fd, dest_path) = tempfile.mkstemp()
dest_path = tmpName = '%s.tmp' % __file__
exportPath = os.path.join(base_path, path).replace('\\','/')
print('exporting %s to %s' % (exportPath, dest_path))
pysvn_client.export( exportPath,
dest_path,
force=False,
revision=rev,
native_eol=None,
ignore_externals=False,
recurse=True,
peg_revision=rev )
return dest_path
if __name__ == "__main__":
logging.info('svnftp.start')
try:
main()
logging.info('svnftp.done')
except Exception as e:
# capture the location of the error for debug purposes
frame = inspect.currentframe()
stack_trace = traceback.format_stack(frame)
logging.debug(stack_trace[:-1])
print(stack_trace)
# end capture
error_text = '\nFATAL EXCEPTION!!!\n'+captureErrorMessage(e)
subject = "ALERT: SVN2FTP Error"
message = """An Error occurred while trying to FTP an SVN commit.
repo_path = %(repo_path)s\n
local_repos_path = %(local_repos_path)s\n
project_directory = %(project_directory)s\n
remote_base_directory = %(remote_base_directory)s\n
error_text = %(error_text)s
""" % globals()
email(toAddrs, message, subject)
logging.error(e)
Notes/Disclaimers:
I have basically no python training so I'm learning as I go and spending lots of time reading docs to figure stuff out.
The body of get_login is in a try block because I was getting strange errors saying there was an unhandled exception in callback_get_login. Never figured out why, but it seems fine now. Let sleeping dogs lie, right?
The username and password for get_login are currently hard-coded (but correct) just to eliminate variables and try to change as little as possible at once. (I added the svnuser and svnpass arguments to the existing argument parsing.)
So that's where I am. I can't figure out why on earth it's not printing anything into svn2ftp.out.log. If you're wondering, the output for one of these failed attempts in svn2ftp.debug.log is:
2012-09-06 15:18:12,496 INFO svnftp.start
2012-09-06 15:18:12,496 INFO svnftp.done
And it's no different on a successful run. So there's nothing useful being logged.
I'm lost. I've gone way down the rabbit hole on this one, and don't know where to go from here. Any ideas?
It looks as if you are overwriting your logging level. Try setting both to DEBUG and see what happens.
import sys, os
import logging
logging.basicConfig(
level=logging.DEBUG, # DEBUG here
format='%(asctime)s %(levelname)s %(message)s',
filename='svn2ftp.debug.log',
filemode='a'
)
console = logging.StreamHandler()
console.setLevel(logging.ERROR) # ERROR here
logging.getLogger('').addHandler(console)
Additionally you are printing in some places and logging in others. I am not sure that the logging library automatically redirects sys.stdout to the logging console. I would convert all print statements to logging statements to be consistent.
I am new to the python language, so please bear with me. Also English isn't my native language so sorry for any misspelled words.
I have a question about updating a Django app from a daemon that runs locally on my server. I have a server setup which has 8 hot-swappable bays. Users can plug-in there hard disk(s) into the server and, after the server has detected that a new hard disk is plugged-in, it starts copying the contents of the hard disk to a location on the network. The current setup displays information about the process on an LCD screen.
The current setup works fine but I need to change it in a way that the whole process is displayed on a website (since this is more user friendly). So I need to display to the user when a disk is inserted into the server, the progress of the copy task etc.
My idea it to create a Django app that gets updated when a task in process is completed, but I can't seem to find any information about updating a Django app from a locally running daemon. It this even possible? Or is Django not the right way to go? Any ideas are welcome.
Below is my script used to copy content of disk to a location on the network. Hopefully it give some more information about what I doing/tying to do.
Many thanks in advance!
Script:
#!/usr/bin/env python
import os
import sys
import glob
import re
import time
import datetime
import pyudev
import thread
import Queue
import gobject
import getopt
from pyudev import Context
from subprocess import Popen, PIPE
from subprocess import check_call
from lcdproc.server import Server
from pyudev.glib import GUDevMonitorObserver
from gobject import MainLoop
from threading import Thread
#used to show progress info
from progressbar import ProgressBar, Percentage, Bar, RotatingMarker, ETA, FileTransferSpeed
# used to set up screens
lcd = Server("localhost", 13666, debug=False)
screens = []
widgets = []
#Used for threading
disk_work_queue = Queue.Queue()
# used to store remote nfs folders
remote_dirs = ['/mnt/nfs/', '/mnt/nfs1/', '/mnt/nfs2/']
#Foldername on remote server (NFS Share name)
REMOTE_NFS_SHARE = ''
# a process that runs infinity, it starts disk processing
# functions.
class ProcessThread(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
while 1:
try:
disk_to_be_processed = disk_work_queue.get(block=False)
set_widget_text(disk_to_be_processed[1], "Removed from queue..", "info", "on")
process_disk(disk_to_be_processed[0], disk_to_be_processed[1])
except Queue.Empty:
time.sleep(10)
set_main_widget_text("Please insert disks ")
# used to set message on the lcdscreen, message are set by disk
def set_widget_text(host, message, priority, blacklight):
if host == "host4":
screen_disk1 = screens[1]
screen_disk1.clear()
screen_disk1.set_priority(priority)
screen_disk1.set_backlight(blacklight)
widgets[1].set_text(str(message))
elif host == "host5":
screen_disk2 = screens[2]
screen_disk2.clear()
screen_disk2.set_priority(priority)
screen_disk2.set_backlight(blacklight)
widgets[2].set_text(str(message))
elif host == "host6":
screen_disk3 = screens[3]
screen_disk3.clear()
screen_disk3.set_priority(priority)
screen_disk3.set_backlight(blacklight)
widgets[3].set_text(str(message))
elif host == "host7":
screen_disk4 = screens[4]
screen_disk4.clear()
screen_disk4.set_priority(priority)
screen_disk4.set_backlight(blacklight)
widgets[4].set_text(str(message))
# used to set a message for all hosts
def set_widget_text_all(hosts, message, priority, blacklight):
for host in hosts:
set_widget_text(host, message, priority, blacklight)
def set_main_widget_text(message):
screen_disk1 = screens[0]
screen_disk1.clear()
screen_disk1.set_priority("info")
screen_disk1.set_backlight("on")
widgets[0].set_text(str(message))
# mounts, find logs files and copy image files to destionation
def process_disk(disk, host):
datadisk = mount_disk(disk, host)
source = datadisk + "/images"
set_widget_text(host, "Processing, hold on ", "info", "on")
cases = find_log(source)
upload(source, cases, host)
time.sleep(5)
umount_disk(host)
set_widget_text(host, "Disk can be removed", "info", "blink")
time.sleep(10)
# search the datadisk for logfiles containing information
# about cases and images
def find_log(src):
inf = ""
case = []
for root,dirs,files in os.walk(src):
for f in files:
if f.endswith(".log"):
log = open(os.path.join(root,f), 'r')
lines = log.readlines()[2:5]
for l in lines:
inf += re.sub("\n","",l[11:]) + ":"
log.close()
print inf
case.append(inf)
inf = ""
return case
def get_directory_size(dir):
dir_size = 0
for(path, dirs, files) in os.walk(dir):
for file in files:
filename = os.path.join(path, file)
dir_size+=os.path.getsize(filename)
return dir_size
# copies the image files to the destination location, dc3dd is used
# to copy the files in a forensicly correct way.
def upload(src, cases, host):
remotedir = ''
while len(cases) > 0:
count = 0
nfs_share_found = False
case = cases.pop()
onderzoek = case.split(':')[0];
#verwijder de _ uit de naam van het object
object = case.split(':')[1];
#image = case.split(':')[2];
localdir = src + '/' + onderzoek + '/' + object +'/'
total_files = len(os.listdir(localdir))
folder_size = get_directory_size(localdir)
for d in remote_dirs:
if os.path.exists(d + onderzoek + '/B/' + object.replace('_',' ') + '/Images/'):
nfs_share_found = True
remotedir = d + onderzoek + '/B/' + object.replace('_', ' ') + '/Images/'
break
if nfs_share_found == False:
set_widget_text(host, " Onderzoek onbekend ", "info", "flash")
time.sleep(30)
return
for root,dirs,files in os.walk(localdir):
for uploadfile in files:
currentfile = os.path.join(root, uploadfile)
file_size = os.stat(currentfile).st_size
copy_imagefile(currentfile, onderzoek, object, remotedir)
count += 1
percentage = int(count*file_size*100/folder_size)
message = onderzoek + " Obj: " + object + "..%d%%" % percentage
set_widget_text(host, message, "info", "on")
set_widget_text(host, " Copy Succesfull! ", "info", "flash")
# the actualy function to copy the files, using dc3dd
def copy_imagefile(currentfile, onderzoek, object, remotedir):
currentfilename = os.path.basename(currentfile)
dc3dd = Popen(["dc3dd", "if=" + currentfile, "hash=md5", "log=/tmp/"+ onderzoek + "_" + object + ".log", "hof=" + remotedir + currentfilename,"verb=on", "nwspc=on"],stdin=PIPE,stdout=PIPE, stderr=PIPE)
dc3dd_stdout = dc3dd.communicate()[1]
awk = Popen([r"awk", "NR==13 { print $1 }"],stdin=PIPE, stdout=PIPE)
awk_stdin = awk.communicate(dc3dd_stdout)[0]
output = awk_stdin.rstrip('\n')
if output == "[ok]":
return False
else:
return True
# when a disk gets inserted into the machine this function is called to prepare the disk
# for later use.
def device_added_callback(self, device):
position = device.sys_path.find('host')
host = device.sys_path[(position):(position+5)]
set_widget_text(host, " New disk inserted! ", "info", "on")
time.sleep(2)
disk = "/dev/" + device.sys_path[-3:] + "1"
disk_work_queue.put((disk, host))
set_widget_text(host, " Placed in queue... ", "info", "on")
# gets called when the disk is removed form the machine
def device_removed_callback(self, device):
position = device.sys_path.find('host')
host = device.sys_path[(position):(position+5)]
#message = 'Slot %s : Please remove drive' % host[4:]
set_widget_text(host, " Replace disk ", "info", "on")
# mounts the partition on the datadisk
def mount_disk(disk, host):
#device = "/dev/" + disk + "1"
mount_point = "/mnt/" + host
if not os.path.exists(mount_point):
os.mkdir(mount_point)
cmd = ['mount', '-o', 'ro,noexec,noatime,nosuid', str(disk), str(mount_point)]
check_call(cmd)
set_widget_text(host, " Disk mounted ", "info", "on")
return mount_point
# umounts the partition datadisk
def umount_disk(host):
mount_point = "/mnt/" + host
cmd = ['umount', str(mount_point)]
check_call(cmd)
os.removedirs(mount_point)
def build_screens():
screen_main = lcd.add_screen("MAIN")
screen_main.set_heartbeat("off")
screen_main.set_duration(3)
screen_main.set_priority("background")
widget0_1 = screen_main.add_string_widget("screen0Widget1", " Welcome to AFFC ", x=1, y=1)
widget0_2 = screen_main.add_string_widget("screen0Widget2", "Please insert disks ", x=1, y=2)
widgets.append(widget0_2)
screens.append(screen_main)
screen_disk1 = lcd.add_screen("DISK1")
screen_disk1.set_heartbeat("off")
screen_disk1.set_duration(3)
screen_disk1.clear()
widget_disk1_1 = screen_disk1.add_string_widget("disk1Widget1", " Slot 1 ", x=1, y=1)
widget_disk1_2 = screen_disk1.add_string_widget("disk1Widget2", " Please insert disk ", x=1, y=2)
widgets.append(widget_disk1_2)
screens.append(screen_disk1)
screen_disk2 = lcd.add_screen("DISK2")
screen_disk2.set_heartbeat("off")
screen_disk2.set_duration(3)
widget_disk2_1 = screen_disk2.add_string_widget("disk2Widget1", " Slot 2 ", x=1, y=1)
widget_disk2_2 = screen_disk2.add_string_widget("disk2Widget2", " Please insert disk ", x=1, y=2)
widgets.append(widget_disk2_2)
screens.append(screen_disk2)
screen_disk3 = lcd.add_screen("DISK3")
screen_disk3.set_heartbeat("off")
screen_disk3.set_duration(3)
widget_disk3_1 = screen_disk3.add_string_widget("disk3Widget1", " Slot 3 ", x=1, y=1)
widget_disk3_2 = screen_disk3.add_string_widget("disk3Widget2", " Please insert disk ", x=1, y=2)
widgets.append(widget_disk3_2)
screens.append(screen_disk3)
screen_disk4 = lcd.add_screen("DISK4")
screen_disk4.set_heartbeat("off")
screen_disk4.set_duration(3)
widget_disk4_1 = screen_disk4.add_string_widget("disk4Widget1", " Slot 4 ", x=1, y=1)
widget_disk4_2 = screen_disk4.add_string_widget("disk4Widget2", " Please insert disk ", x=1, y=2)
widgets.append(widget_disk4_2)
screens.append(screen_disk4)
def restart_program():
"""Restarts the current program.
Note: this function does not return. Any cleanup action (like
saving data) must be done before calling this function."""
python = sys.executable
os.execl(python, python, * sys.argv)
def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "hd:v", ["help", "destination="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized"
usage()
sys.exit(2)
verbose = False
for o, a in opts:
if o == "-v":
verbose = True
elif o in ("-h", "--help"):
usage()
sys.exit()
elif o in ("-d", "--destination"):
REMOTE_NFS_SHARE = a
else:
assert False, "unhandled option"
lcd.start_session()
build_screens()
#t = Thread(target=loop_disks_process())
#t.start();
context = pyudev.Context()
monitor = pyudev.Monitor.from_netlink(context)
observer = GUDevMonitorObserver(monitor)
observer.connect('device-added', device_added_callback)
observer.connect('device-removed', device_removed_callback)
monitor.filter_by(subsystem='block', device_type='disk')
monitor.enable_receiving()
mainloop = MainLoop()
gobject.threads_init()
t = ProcessThread()
t.start()
mainloop.run()
raw_input("Hit <enter>")
t.running = False
t.join()
if __name__ == "__main__":
try:
main()
except Exception, e:
restart_program()
Sorry, much too much code to read there.
I'm not sure what you mean by "updating" a Django app. Do you mean adding some data into the database? This is easy to do, either by getting your script to write directly into the DB, or by using something like a custom Django management command which can use the ORM.
Take a look at Django Piston. You can implement a RESTful API on your django app and call those apis from your demon. I use it on one of my project in which some worker processes need to communicate to frontend django apps periodically.
It could be done like this:
Daemon shares its disk information/copy progress using some inter-process communication method like simple text file or some memory objects;
Django view could then read this info and display it to the user;
Or daemon could call Django management command (#Daniel Roseman) and that command will then update app DB to represent current state.
Consider using something like Memcached as a shared area to store the state of the drives.
As the drives are added or removed, the daemon should write those changes to Memcached, and on each page load the Django web app should read the state from Memcached. You could use a management command and a SQL database, but that seems like too many moving parts for such simple problem: you're only storing a handful boolean flags.
You might even try a micro-framework like Flask instead of Django, to reduce the complexity even more.