How to list all folders in a ftp directory? - python

I am trying to list the folders in a ftp directory so that the output is as follows:
[32114, 32115, 32116, ..., 42123]
The following script accesses the ftp site. How can I access the folders in the ftp directory using the attached script as a start?
import arcpy, ftplib, os, socket
HOST = 'atlas.ca.gov'
DIRN = '/pub/naip/2012/doqqs_combined_color-nir/'
workspace = 'F:/download/location'
# Get current working directory
print 'The current working directory is %s' % os.getcwd()
# Change the current working directory to a download folder
os.chdir(workspace)
print 'The workspace has been changed to %s' % workspace
try:
f = ftplib.FTP(HOST)
except (socket.error, socket.gaierror), e:
print 'ERROR: cannot reach "%s"' % HOST
print '*** Connected to host "%s"' % HOST
try:
f.login()
except ftplib.error_perm:
print 'Error: cannot login anonymously'
f.quit()
print '*** Logged in as "anonymous"'
try:
f.cwd(DIRN)
except ftplib.error_perm:
print 'ERROR: cannot CD to "%s"' % DIRN
f.quit()
print '*** Changed to "%s" folder' % DIRN

If you look at the ftplib docs, there are two obvious functions for this: nlst and dir.
Either one will give you a list of all members in the directory, both files and subdirectories.
With nlst, you don't get any information beyond the name. But if you were planning to chdir or otherwise use each one of them, that's OK; you will get exceptions for the ones that turned out to be regular file, and can just skip over them.
With dir, you get a full directory listing, in human-readable form. Which you will have to capture (by passing a callback function) and then parse manually—which is a lot less fun, but the only way you can know in advance which members are files and which are directories.

If the server is a windows machine, the output of
ftp.dir() will look like 01-14-14 04:21PM <DIR> Output
If the server is of the *nix variety, the output will look like drwx------ 3 user group 512 Apr 22 2005 Mail
If you want to check if an item is a directory, use a regular expression on each line.
Windows FTP server will contain <DIR> if the item is a directory and *nix ftp servers will start with d.
Do you need help with Python regexes?

Building on the answer by abarnert, the following produces the list:
from ftplib import FTP
ftp = FTP('atlas.ca.gov')
ftp.login()
ftp.cwd('/pub/naip/2012/doqqs_combined_color-nir/')
print ftp.nlst()

Related

Avoid Overiding Existing File [duplicate]

I am using pysftp library's get_r function (https://pysftp.readthedocs.io/en/release_0.2.9/pysftp.html#pysftp.Connection.get_r) to get a local copy of a directory structure from sftp server.
Is that the correct approach for a situation when the contents of the remote directory have changed and I would like to get only the files that changed since the last time the script was run?
The script should be able to sync the remote directory recursively and mirror the state of the remote directory - f.e. with a parameter controlling if the local outdated files (those that are no longer present on the remote server) should be removed, and any changes to the existing files and new files should be fetched.
My current approach is here.
Example usage:
from sftp_sync import sync_dir
sync_dir('/remote/path/', '/local/path/')
Use the pysftp.Connection.listdir_attr to get file listing with attributes (including the file timestamp).
Then, iterate the list and compare against local files.
import os
import pysftp
import stat
remote_path = "/remote/path"
local_path = "/local/path"
with pysftp.Connection('example.com', username='user', password='pass') as sftp:
sftp.cwd(remote_path)
for f in sftp.listdir_attr():
if not stat.S_ISDIR(f.st_mode):
print("Checking %s..." % f.filename)
local_file_path = os.path.join(local_path, f.filename)
if ((not os.path.isfile(local_file_path)) or
(f.st_mtime > os.path.getmtime(local_file_path))):
print("Downloading %s..." % f.filename)
sftp.get(f.filename, local_file_path)
Though these days, you should not use pysftp, as it is dead. Use Paramiko directly instead. See pysftp vs. Paramiko. The above code will work with Paramiko too with its SFTPClient.listdir_attr.

Python - download all folders, subfolders, and files with the python ftplib module

I have been working all day trying to figure out how to use the python ftplib module to download folders, subfolders, and files from an ftp server but I could only come up with this.
from ftplib import FTP
import sys, ftplib
sys.tracebacklimit = 0 # Does not display traceback errors
sys.stderr = "/dev/null" # Does not display Attribute errors
Host = "ftp.debian.org"
Port = 21
Username = ""
Password = ""
def MainClass():
global ftp
global con
Host
Port
ftp = FTP()
con = ftp.connect(Host, Port) # Connects to the host with the specified port
def grabfile():
source = "/debian/"
filename = "README.html"
ftp.cwd(source)
localfile = open(filename, 'wb')
ftp.retrbinary('RETR ' + filename, localfile.write)
ftp.quit()
localfile.close()
try:
MainClass()
except Exception:
print "Not Connected"
print "Check the address", Host + ":" + str(Port)
else:
print "Connected"
if ftplib.error_perm and not Username == "" and Password == "":
print "Please check your credentials\n", Username, "\n", Password
credentials = ftp.login(Username, Password)
grabfile()
This python script will download a README.html file from ftp.debian.org but, I would like to be able to download whole folders with files and subfolders in them and I cannot seem to figure that out. I have searched around for different python scripts using this module but I cannot seem to find any that do what I want.
Any suggestions or help would be greatly appreciated.
Note:
I would still like to use python for this job but it could be a different module such as ftputil or any other one out there.
Thanks in advance,
Alex
The short solution:
You could possibly just run: "wget -r ftp://username:password#ftp.debian.org/debian/*" to get all the files under the debian directory.
Then you can process the files in python.
The long solution:
You can go over every directory listing by using ftplib, getting a directory listing parsing it and then getting every file and recursing into directories.
If you search the web you'd find previous posts on stackoverlow which solve this issue

Python - FileNotFoundError when dealing with DMZ

I created a python script to copy files from a source folder to a destination folder, the script runs fine in my local machine.
However, when I tried to change the source to a path located in a server installed in a DMZ and the destination to a folder in a local servers I got the following error:
FileNotFoundError: [WinError 3] The system cannot find the path specified: '\reports'
And Here is the script:
import sys, os, shutil
import glob
import os.path, time
fob= open(r"C:\Log.txt","a")
dir_src = r"\reports"
dir_dst = r"C:\Dest"
dir_bkp = r"C:\Bkp"
for w in list(set(os.listdir(dir_src)) - set(os.listdir(dir_bkp))):
if w.endswith('.nessus'):
pathname = os.path.join(dir_src, w)
Date_File="%s" %time.ctime(os.path.getmtime(pathname))
print (Date_File)
if os.path.isfile(pathname):
shutil.copy2(pathname, dir_dst)
shutil.copy2(pathname, dir_bkp)
fob.write("File Name: %s" % os.path.basename(pathname))
fob.write(" Last modified Date: %s" % time.ctime(os.path.getmtime(pathname)))
fob.write(" Copied On: %s" % time.strftime("%c"))
fob.write("\n")
fob.close()
os.system("PAUSE")
Okay, we first need to see what kind of remote folder you have.
If your remote folder is shared windows network folder, try mapping it as a network drive: http://windows.microsoft.com/en-us/windows/create-shortcut-map-network-drive#1TC=windows-7
Then you can just use something like Z:\reports to access your files.
If your remote folder is actually a unix server, you could use paramiko to access it and copy files from it:
import paramiko, sys, os, posixpath, re
def copyFilesFromServer(server, user, password, remotedir, localdir, filenameRegex = '*', autoTrust=True):
# Setup ssh connection for checking directory
sshClient = paramiko.SSHClient()
if autoTrust:
sshClient.set_missing_host_key_policy(paramiko.AutoAddPolicy()) #No trust issues! (yes this could potentially be abused by someone malicious with access to the internal network)
sshClient.connect(server,user,password)
# Setup sftp connection for copying files
t = paramiko.Transport((server, 22))
t.connect(user, password)
sftpClient = paramiko.SFTPClient.from_transport(t)
fileList = executeCommand(sshclient,'cd {0}; ls | grep {1}'.format(remotedir, filenameRegex)).split('\n')
#TODO: filter out empties!
for filename in fileList:
try:
sftpClient.get(posixpath.join(remotedir, filename), os.path.join(localdir, filename), callback=None) #callback for showing number of bytes transferred so far
except IOError as e:
print 'Failed to download file <{0}> from <{1}> to <{2}>'.format(filename, remotedir, localdir)
If your remote folder is something served with the webdav protocol, I'm just as interested in an answer as you are.
If your remote folder is something else still, please explain. I have not yet found a solution that treats all equally, but I'm very interested in one.

How to check if folder exist inside a directory if not create it using python

I use the below python script to check if a file exist on the root of my ftp server.
from ftplib import FTP
ftp = FTP('ftp.hostname.com')
ftp.login('login', 'password')
folderName = 'foldername'
if folderName in ftp.nlst() :
print 'YES'
else : print 'NO'
How can I modify the above script to look inside a specific folder instead of the root directory?
For example, I want to see if a folder name called foo exists inside the www directory.
The goal of my question, is to see if the folder foo exists inside the www directory, if so print cool! if not create a folder called foo inside www.
from ftplib import FTP
ftp = FTP('ftp.hostname.com')
ftp.login('login', 'password')
where = 'www'
folderName = 'foldername'
if folderName in ftp.nlst(where) :
print 'YES'
else :
print 'NO'
Just send the directory you want to see in as first argument of ftp.nlst()
After the hint from Hans! I searched on google for those commandes and I found this link : http://docs.python.org/2/library/ftplib.html
from ftplib import FTP
ftp = FTP('ftp.hostname.com')
ftp.login('login', 'passwrd')
ftp.cwd('www') # change into 'www' directory
if 'foo' in ftp.nlst() : #check if 'foo' exist inside 'www'
print 'YES'
ftp.cwd('foo') # change into "foo" directory
ftp.retrlines('LIST') #list directory contents
else :
print 'NO'
ftp.mkd('foo') #Create a new directory called foo on the server.
ftp.cwd('foo') # change into 'foo' directory
ftp.retrlines('LIST') #list subdirectory contents
ftp.close() #close connection
ftplib is a rather thin wrapper around the FTP protocol. You can look at http://en.wikipedia.org/wiki/List_of_FTP_commands to see what the FTP commands do.
Hint: look at CWD, LIST, MKD.
For LIST you will need ftp.retrlines and parse it to see if it is a directory.

python script to recursively search FTP for specific filename and newer than 24 hours

Our storage area ran into trouble with SMB connections and now we have been forced to use FTP to access files on a regular basis. So rather than using Bash, I am trying to use python but I am running into a few problems. The script needs to recursively search through the FTP directory and find all files "*1700_m30.mp4" newer than 24 hours. Then copy all these files locally.
this is what I got so far - but I can't seem to get the script to download the files or get the stats from the files that tell me whether they are newer than 24 hours.
#!/usr/bin/env python
# encoding: utf-8
import sys
import os
import ftplib
import ftputil
import fnmatch
import time
dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for
print 'Looking for this pattern :', pattern # print pattern
print "logging into GSP" # print
host = ftputil.FTPHost('xxx.xxx','xxx','xxxxx') # ftp host info
recursive = host.walk("/GSPstor/xxxxx/xxx/xxx/xxx/xxxx",topdown=True,onerror=None) # recursive search
for root,dirs,files in recursive:
for name in files:
print 'Files :', files # print all files it finds
video_list = fnmatch.filter(files, pattern)
print 'Files to be moved :', video_list # print list of files to be moved
if host.path.isfile(video_list): # check whether the file is valid
host.download(video_list, video_list, 'b') # download file list
host.close
Here is the modified script based on the excellent recommendations from ottomeister (thank you!!) - the last issue now is that it downloads but it keeps downloading the files and overwriting the existing files:
import sys
import os
import ftplib
import ftputil
import fnmatch
import time
from time import mktime
import datetime
import os.path, time
from ftplib import FTP
dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for
print 'Looking for this pattern :', pattern # print pattern
utc_datetime_less24H = datetime.datetime.utcnow()-datetime.timedelta(seconds=86400) #UTC time minus 24 hours in seconds
print 'UTC time less than 24 Hours is: ', utc_datetime_less24H.strftime("%Y-%m-%d %H:%M:%S") # print UTC time minus 24 hours in seconds
print "logging into GSP FTP" # print
with ftputil.FTPHost('xxxxxxxx','xxxxxx','xxxxxx') as host: # ftp host info
recursive = host.walk("/GSPstor/xxxx/com/xxxx/xxxx/xxxxxx",topdown=True,onerror=None) # recursive search
for root,dirs,files in recursive:
for name in files:
print 'Files :', files # print all files it finds
video_list = fnmatch.filter(files, pattern) # collect all files that match pattern into variable:video_list
statinfo = host.stat(root, video_list) # get the stats from files in variable:video_list
file_mtime = datetime.datetime.utcfromtimestamp(statinfo.st_mtime)
print 'Files with pattern: %s and epoch mtime is: %s ' % (video_list, statinfo.st_mtime)
print 'Last Modified: %s' % datetime.datetime.utcfromtimestamp(statinfo.st_mtime)
if file_mtime >= utc_datetime_less24H:
for fname in video_list:
fpath = host.path.join(root, fname)
if host.path.isfile(fpath):
host.download_if_newer(fpath, os.path.join(dir_dest, fname), 'b')
host.close()
This line:
video_list = fnmatch.filter(files, pattern)
gets you a list of filenames that match your glob pattern. But this line:
if host.path.isfile(video_list): # check whether the file is valid
is bogus, because host.path.isfile() does not want a list of filenames as its argument. It wants a single pathname. So you need to iterate over video_list constructing one pathname at a time, passing each of those pathnames to host.path.isfile(), and then possibly downloading that particular file. Something like this:
import os.path
for fname in video_list:
fpath = host.path.join(root, fname)
if host.path.isfile(fpath):
host.download(fpath, os.path.join(dir_dest, fname), 'b')
Note that I'm using host.path.join() to manage remote pathnames and os.path.join() to manage local pathnames. Also note that this puts all of the downloaded files into a single directory. If you want to put them into a directory hierarchy that mirrors the remote layout (you'll have to do something like that if the filenames in different remote directories can clash) then you'll need to construct a different destination path, and you'll probably have to create the local destination directory hierarchy too.
To get timestamp information use host.lstat() or host.stat() depending on how you want to handle symlinks.
And yes, that should be host.close(). Without it the connection will be closed after the host variable goes out of scope and is garbage-collected, but it's better to close it explicitly. Even better, use a with clause to ensure that the connection gets closed even if an exception causes this code to be abandoned before it reaches the host.close() call, like this:
with ftputil.FTPHost('xxx.xxx','xxx','xxxxx') as host: # ftp host info
recursive = host.walk(...)
...

Categories