How to make Python check if ftp directory exists? - python

I'm using this script to connect to sample ftp server and list available directories:
from ftplib import FTP
ftp = FTP('ftp.cwi.nl') # connect to host, default port (some example server, i'll use other one)
ftp.login() # user anonymous, passwd anonymous#
ftp.retrlines('LIST') # list directory contents
ftp.quit()
How do I use ftp.retrlines('LIST') output to check if directory (for example public_html) exists, if it exists cd to it and then execute some other code and exit; if not execute code right away and exit?

Nslt will list an array for all files in ftp server. Just check if your folder name is there.
from ftplib import FTP
ftp = FTP('yourserver')
ftp.login('username', 'password')
folderName = 'yourFolderName'
if folderName in ftp.nlst():
#do needed task

you can use a list. example
import ftplib
server="localhost"
user="user"
password="test#email.com"
try:
ftp = ftplib.FTP(server)
ftp.login(user,password)
except Exception,e:
print e
else:
filelist = [] #to store all files
ftp.retrlines('LIST',filelist.append) # append to list
f=0
for f in filelist:
if "public_html" in f:
#do something
f=1
if f==0:
print "No public_html"
#do your processing here

You can send "MLST path" over the control connection.
That will return a line including the type of the path (notice 'type=dir' down here):
250-Listing "/home/user":
modify=20131113091701;perm=el;size=4096;type=dir;unique=813gc0004; /
250 End MLST.
Translated into python that should be something along these lines:
import ftplib
ftp = ftplib.FTP()
ftp.connect('ftp.somedomain.com', 21)
ftp.login()
resp = ftp.sendcmd('MLST pathname')
if 'type=dir;' in resp:
# it should be a directory
pass
Of course the code above is not 100% reliable and would need a 'real' parser.
You can look at the implementation of MLSD command in ftplib.py which is very similar (MLSD differs from MLST in that the response in sent over the data connection but the format of the lines being transmitted is the same):
http://hg.python.org/cpython/file/8af2dc11464f/Lib/ftplib.py#l577

The examples attached to ghostdog74's answer have a bit of a bug: the list you get back is the whole line of the response, so you get something like
drwxrwxrwx 4 5063 5063 4096 Sep 13 20:00 resized
This means if your directory name is something like '50' (which is was in my case), you'll get a false positive. I modified the code to handle this:
def directory_exists_here(self, directory_name):
filelist = []
self.ftp.retrlines('LIST',filelist.append)
for f in filelist:
if f.split()[-1] == directory_name:
return True
return False
N.B., this is inside an FTP wrapper class I wrote and self.ftp is the actual FTP connection.

Tom is correct, but no one voted him up
however for the satisfaction who voted up ghostdog74 I will mix and write this code, works for me, should work for you guys.
import ftplib
server="localhost"
user="user"
uploadToDir="public_html"
password="test#email.com"
try:
ftp = ftplib.FTP(server)
ftp.login(user,password)
except Exception,e:
print e
else:
filelist = [] #to store all files
ftp.retrlines('NLST',filelist.append) # append to list
num=0
for f in filelist:
if f.split()[-1] == uploadToDir:
#do something
num=1
if num==0:
print "No public_html"
#do your processing here
first of all if you follow ghost dog method, even if you say directory "public" in f, even when it doesnt exist it will evaluate to true because the word public exist in "public_html" so thats where Tom if condition can be used
so I changed it to if f.split()[-1] == uploadToDir:.
Also if you enter a directory name somethig that doesnt exist but some files and folder exist the second by ghostdog74 will never execute because its never 0 as overridden by f in for loop so I used num variable instead of f and voila the goodness follows...
Vinay and Jonathon are right about what they commented.

In 3.x nlst() method is deprecated. Use this code:
import ftplib
remote = ftplib.FTP('example.com')
remote.login()
if 'foo' in [name for name, data in list(remote.mlsd())]:
# do your stuff
The list() call is needed because mlsd() returns a generator and they do not support checking what is in them (do not have __contains__() method).
You can wrap [name for name, data in list(remote.mlsd())] list comp in a function of method and call it when you will need to just check if a directory (or file) exists.

=> I found this web-page while googling for a way to check if a file exists using ftplib in python. The following is what I figured out (hope it helps someone):
=> When trying to list non-existent files/directories, ftplib raises an exception. Even though Adding a try/except block is a standard practice and a good idea, I would prefer my FTP scripts to download file(s) only after making sure they exist. This helps in keeping my scripts simpler - at least when listing a directory on the FTP server is possible.
For example, the Edgar FTP server has multiple files that are stored under the directory /edgar/daily-index/. Each file is named liked "master.YYYYMMDD.idx". There is no guarantee that a file will exist for every date (YYYYMMDD) - there is no file dated 24th Nov 2013, but there is a file dated: 22th Nov 2013. How does listing work in these two cases?
# Code
from __future__ import print_function
import ftplib
ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL#gmail.com")
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131122.idx")
print(resp)
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
print(resp)
# Output
250-Start of list for /edgar/daily-index/master.20131122.idx
modify=20131123030124;perm=adfr;size=301580;type=file;unique=11UAEAA398;
UNIX.group=1;UNIX.mode=0644;UNIX.owner=1019;
/edgar/daily-index/master.20131122.idx
250 End of list
Traceback (most recent call last):
File "", line 10, in <module>
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
File "lib/python2.7/ftplib.py", line 244, in sendcmd
return self.getresp()
File "lib/python2.7/ftplib.py", line 219, in getresp
raise error_perm, resp
ftplib.error_perm: 550 '/edgar/daily-index/master.20131124.idx' cannot be listed
As expected, listing a non-existent file generates an exception.
=> Since I know that the Edgar FTP server will surely have the directory /edgar/daily-index/, my script can do the following to avoid raising exceptions due to non-existent files:
a) list this directory.
b) download the required file(s) if they are are present in this listing - To check the listing I typically perform a regexp search, on the list of strings that the listing operation returns.
For example this script tries to download files for the past three days. If a file is found for a certain date then it is downloaded, else nothing happens.
import ftplib
import re
from datetime import date, timedelta
ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL#gmail.com")
listing = []
# List the directory and store each directory entry as a string in an array
ftp_client.retrlines("LIST /edgar/daily-index", listing.append)
# go back 1,2 and 3 days
for diff in [1,2,3]:
today = (date.today() - timedelta(days=diff)).strftime("%Y%m%d")
month = (date.today() - timedelta(days=diff)).strftime("%Y_%m")
# the absolute path of the file we want to download - if it indeed exists
file_path = "/edgar/daily-index/master.%(date)s.idx" % { "date": today }
# create a regex to match the file's name
pattern = re.compile("master.%(date)s.idx" % { "date": today })
# filter out elements from the listing that match the pattern
found = filter(lambda x: re.search(pattern, x) != None, listing)
if( len(found) > 0 ):
ftp_client.retrbinary(
"RETR %(file_path)s" % { "file_path": file_path },
open(
'./edgar/daily-index/%(month)s/master.%(date)s.idx' % {
"date": today
}, 'wb'
).write
)
=> Interestingly, there are situations where we cannot list a directory on the FTP server. The edgar FTP server, for example, disallows listing on /edgar/data because it contains far too many sub-directories. In such cases, I wouldn't be able to use the "List and check for existence" approach described here - in these cases I would have to use exception handling in my downloader script to recover from non-existent file/directory access attempts.

from ftplib import FTP
ftp = FTP()
ftp.connect(hostname, 21)
ftp.login(username,password)
try:
ftp.cwd('your folder name')
#do the code for successfull cd
except Exception:
#do the code for folder not exists

Related

How can I know the dates of the files uploaded in azure file share

I have a file share in azure, and I want to list the content of the files, as well the date of the upload, so that I can see the most recent files uploaded.
I managed to list the files, however I can not see the dates of the upload. Here is my code:
from azure.storage.file import FileService
file_service = FileService(account_name='', account_key='')
generator = list(file_service.list_directories_and_files(''))
try:
for file_or_dir in generator:
properties= file_service.get_file_properties(share_name='', directory_name="", file_name=file_or_dir.name)
print(file_or_dir.name, file_or_dir.properties.__dict__)
except ResourceNotFoundError as ex:
print('ResourceNotFoundError:', ex.message)
When I use the __dict__properties, I got this result:
file_name.zip {'last_modified': None, ...}
UPDATE
with this code, it works:
from azure.storage.file import FileService
file_service = FileService(account_name='', account_key='')
generator = list(file_service.list_directories_and_files(''))
try:
for file_or_dir in generator:
file_in = file_service.get_file_properties(share_name='', directory_name="", file_name=file_or_dir.name)
print(file_or_dir.name, file_in.properties.last_modified)
except ResourceNotFoundError as ex:
print('ResourceNotFoundError:', ex.message)
This is expected behavior. When you list files and directories in an Azure File Share, very minimal information is returned. For files, only the file name and size is returned.
To get other properties of a file, you will need to separately call get_file_properties for each file in the list. The result of this operation will contain the last modified date of the file.
Update
Please try something like (untested code):
try:
for file_or_dir in generator:
properties = file_service.get_file_properties(share_name="share-name", directory_name="", file_name=file_or_dir.name)
print(file_or_dir.name, properties.__dict__)

Python ftplib error 426 when putting files on iSeries

Have a peculiar issue that I can't seem to fix on my own..
I'm attempting to FTP a list of files in a directory over to an iSeries IFS using Python's ftplib library.
Note, the files are in a single subdirectory down from the python script.
Below is an excerpt of the code that is giving me trouble:
from ftplib import FTP
import os
localpath = os.getcwd() + '/Files/'
def putFiles():
hostname = 'host.name.com'
username = 'myuser'
password = 'mypassword'
myftp = FTP(hostname)
myftp.login(username, password)
myftp.cwd('/STUFF/HERE/')
for file in os.listdir(localpath):
if file.endswith('.csv'):
try:
file = localpath + file
print 'Attempting to move ' + file
myftp.storbinary("STOR " + file, open(file, 'rb'))
except Exception as e:
print(e)
The specific error that I am getting throw is:
Attempting to move /home/doug/Files/FILE.csv
426-Unable to open or create file /home/doug/Files to receive data.
426 Data transfer ended.
What I've done so far to troubleshoot:
Initially I thought this was a permissions issue on the directory containing my files. I used chmod 777 /home/doug/Files and re-ran my script, but the same exception occured.
Next I assumed there was an issue between my machine and the iSeries. I validated that I could indeed put files by using ftp. I was successfully able to put the file on the iSeries IFS using the shell FTP.
Thanks!
Solution
from ftplib import FTP
import os
localpath = os.getcwd() + '/Files/'
def putFiles():
hostname = 'host.name.com'
username = 'myuser'
password = 'mypassword'
myftp = FTP(hostname)
myftp.login(username, password)
myftp.cwd('/STUFF/HERE/')
for csv in os.listdir(localpath):
if csv.endswith('.csv'):
try:
myftp.storbinary("STOR " + csv, open(localpath + csv, 'rb'))
except Exception as e:
print(e)
As written, your code is trying to execute the following FTP command:
STOR /home/doug/Files/FILE.csv
Meaning it is trying to create /home/doug/Files/FILE.csv on the IFS. Is this what you want? I suspect that it isn't, given that you bothered to change the remote directory to /STUFF/HERE/.
If you are trying to issue the command
STOR FILE.csv
then you have to be careful how you deal with the Python variable that you've named file. In general, it's not recommended that you reassign a variable that is the target of a for loop, precisely because this type of confusion can occur. Choose a different variable name for localpath + file, and use that in your open(..., 'rb').
Incidentally, it looks like you're using Python 2, since there is a bare print statement with no parentheses. I'm sure you're aware that Python 3 is recommended by now, but if you do stick to Python 2, it's recommended that you avoid using file as a variable name, because it actually means something in Python 2 (it's the name of a type; specifically, the return type of the open function).

access remote files on server with smb protocol python3

I have a remote server with some files.
smb://ftpsrv/public/
I can be authorized there as an anonymous user. In java I could simply write this code:
SmbFile root = new SmbFile(SMB_ROOT);
And get the ability to work with files inside (it is all I need, one row!), but I can't find how to manage with this task in Python 3, there are a lot of resources, but I think they are not relevant to my problem, because they are frequently tailored for Python 2, and old other approaches. Is there some simple way, similar to Java code above?
Or can somebody provide a real working solution if, for example, I want to access file fgg.txt in smb://ftpsrv/public/ folder. Is there really a handy lib to tackle this problem?
For example on site:
import tempfile
from smb.SMBConnection import SMBConnection
# There will be some mechanism to capture userID, password, client_machine_name, server_name and server_ip
# client_machine_name can be an arbitary ASCII string
# server_name should match the remote machine name, or else the connection will be rejected
conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)
assert conn.connect(server_ip, 139)
file_obj = tempfile.NamedTemporaryFile()
file_attributes, filesize = conn.retrieveFile('smbtest', '/rfc1001.txt', file_obj)
# Retrieved file contents are inside file_obj
# Do what you need with the file_obj and then close it
# Note that the file obj is positioned at the end-of-file,
# so you might need to perform a file_obj.seek() if you need
# to read from the beginning
file_obj.close()
Do I seriously need to provide all of these details: conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)?
A simple example of opening a file using urllib and pysmb in Python 3
import urllib
from smb.SMBHandler import SMBHandler
opener = urllib.request.build_opener(SMBHandler)
fh = opener.open('smb://host/share/file.txt')
data = fh.read()
fh.close()
I haven't got an anonymous SMB share ready to test it with, but this code should work.
urllib2 is the python 2 package, in python 3 it was renamed to just urllib and some stuff got moved around.
I think you were asking for Linux, but for completeness I'll share how it works on Windows.
On Windows, it seems that Samba access is supported out of the box with Python's standard library functions:
import glob, os
with open(r'\\USER1-PC\Users\Public\test.txt', 'w') as f:
f.write('hello') # write a file on a distant Samba share
for f in glob.glob(r'\\USER1-PC\Users\**\*', recursive=True):
print(f) # glob works too
if os.path.isfile(f):
print(os.path.getmtime(f)) # we can get filesystem information

If "string" in variable: not working Python3

The problem: the string specified is not being found in the text file, why?
Description: I've got a simple Python script here that checks to see if a file exists, if it does, check the integrity, if it passes, stop. If it fails, recreate the file. If the file doesn't exist make it.
I've got everything working but the integrity check. The integrity check right now is simply looking for a a string called "[driveC]", I'd like to make it more thorough but this is what I've got going so far.
Any thoughts? A work around is to convert the config file into a list variable and search through the list for the string. But I'd like to use this method as it seems scalable.
My code: (also can be seen here https://hastebin.com/umitibigib.py) line 55 is the check that is failing
###io testing
import os.path
try:
from configparser import ConfigParser
except ImportError:
from ConfigParser import ConfigParser # ver. < 3.0
#variables
drives_given = [ 'C', 'D']
# instantiate config parser
config = ConfigParser()
cfg_path = os.path.exists('smartPyConfig.ini')
#A config file was not found, let's make one
def create_config_file():
cfgfile = open("smartPyConfig.ini",'w')
print("A new config file was created")
print("")
print("Adding thresholds and drive sections")
#Add general settings
config.add_section('general')
config.set('general', 'logging_level', 'debug')
#Add smartctl threshold values
config.add_section('standard_thresholds')
config.set('standard_thresholds', 'threshold_value_raw_read_error_rate_norm', '101')
config.set('standard_thresholds', 'threshold_value_reallocated_sector_count_norm', '105')
config.set('standard_thresholds', 'threshold_value_seek_error_rate_norm', '101')
config.set('standard_thresholds', 'threshold_value_power_on_hours_raw', '1000')
config.set('standard_thresholds', 'threshold_value_temperature_celsius_raw', '100')
config.set('standard_thresholds', 'threshold_value_reported_uncorrect_raw', '100')
config.set('standard_thresholds', 'threshold_value_hardware_ecc_recovered_norm', '100')
config.set('standard_thresholds', 'threshold_value_offline_uncorrectable_raw', '100')
config.set('standard_thresholds', 'threshold_value_free_fall_sensor_raw', '100')
config.set('standard_thresholds', 'threshold_value_udma_crc_error_count_norm', '350')
#DONE
#Create a section for each drive we were given
#for every drive letter listed in the drives_given list, make a section for it
for i in drives_given:
config.add_section('drive%s' % i)
#Write out the data and close the file
config.write(cfgfile)
cfgfile.close()
print("Config file created and written to disk.")
#Check to see if file is healthy, if not recreate it.
def check_file_integrity():
with open("smartPyConfig.ini", 'r') as file:
if "[driveC]" in file: #Not working
print("found drive C in config file.")
print("finished")
else:
print("drive C not found in config file.")
create_config_file()
#check for a config file
def check_for_config():
# Check to see if the file exists
try:
if cfg_path: #if cfg_path is true (true = the file was found) do this
print("Config file found!")
print("Checking config file..")
check_file_integrity()
else: #if cfg_path is not true, file was not found, do this
print("Config file not found")
print("Creating config file.")
create_config_file()
except Exception as e:
print("An exception occured, printing exception")
print(e)
check_for_config()
The config file it's checking:
[general]
logging_level = debug
[standard_thresholds]
threshold_value_raw_read_error_rate_norm = 101
threshold_value_reallocated_sector_count_norm = 105
threshold_value_seek_error_rate_norm = 101
threshold_value_power_on_hours_raw = 1000
threshold_value_temperature_celsius_raw = 100
threshold_value_reported_uncorrect_raw = 100
threshold_value_hardware_ecc_recovered_norm = 100
threshold_value_offline_uncorrectable_raw = 100
threshold_value_free_fall_sensor_raw = 100
threshold_value_udma_crc_error_count_norm = 350
[driveC]
[driveD]
Your variable file is the file, not the contents of the file. You may want something like:
if "[driveC]" in file.read():
... which tests to see if that string is in the contents of the file.
What you originally had checks for an exact match on some line of the file, since the in operator will iterate over the file's lines. This didn't work because each line ends with a newline character, which you did not include in your target string. Like this:
if "[driveC]\n" in file:
If you need it to match exactly that text on a single line (with not even any whitespace on the same line), that will work. As a bonus, it will stop as soon as it finds the match instead of reading the whole file (although for smallish files, reading the whole file is probably just as fast or faster).

Access a local file, but ensure it is up-to-date

How can I use the Python standard library to get a file object, silently ensuring it's up-to-date from some other location?
A program I'm working on needs to access a set of files locally; they're
just normal files.
But those files are local cached copies of documents available at remote
URLs — each file has a canonical URL for that file's content.
(I write here about HTTP URLs, but I'm looking for a solution that isn't specific to any particular remote fetching protocol.)
I'd like an API for ‘get_file_from_cache’ that looks something like:
file_urls = {
"/path/to/foo.txt": "http://example.org/spam/",
"other/path/bar.data": "https://example.net/beans/flonk.xml",
}
for (filename, url) in file_urls.items():
infile = get_file_from_cache(filename, canonical=url)
do_stuff_with(infile.read())
If the local file's modification timestamp is not significantly
earlier than the Last-Modified timestamp for the document at the
corresponding URL, get_file_from_cache just returns the file object
without changing the file.
The local file might be out of date (its modification timestamp may be
significantly older than the Last-Modified timestamp from the
corresponding URL). In that case, get_file_from_cache should first
read the document's contents into the file, then return the file
object.
The local file may not yet exist. In that case, get_file_from_cache
should first read the document content from the corresponding URL,
create the local file, and then return the file object.
The remote URL may not be available for some reason. In that case,
get_file_from_cache should simply return the file object, or if that
can't be done, raise an error.
So this is something similar to an HTTP object cache. Except where those
are usually URL-focussed with the local files a hidden implementation
detail, I want an API that focusses on the local files, with the remote
requests a hidden implementation detail.
Does anything like this exist in the Python library, or as simple code
using it? With or without the specifics of HTTP and URLs, is there some
generic caching recipe already implemented with the standard library?
This local file cache (ignoring the spcifics of URLs and network access)
seems like exactly the kind of thing that is easy to get wrong in
countless ways, and so should have a single obvious implementation
available.
Am I in luck? What do you advise?
From a quick Googling I couldn't find an existing library that can do, although I'd be surprised if there weren't such a thing. :)
Anyway, here's one way to do it using the popular Requests module. It'd be pretty easy to adapt this code to use urllib / urlib2, though.
#! /usr/bin/env python
''' Download a file if it doesn't yet exist in offline cache, or if the online
version is more than age seconds newer than the cached version.
Example code for
http://stackoverflow.com/questions/26436641/access-a-local-file-but-ensure-it-is-up-to-date
Written by PM 2Ring 2014.10.18
'''
import sys
import os
import email.utils
import requests
cache_path = 'offline_cache'
#Translate local file names in cache_path to URLs
file_urls = {
'example1.html': 'http://www.example.com/',
'badfile': 'http://httpbin.org/status/404',
'example2.html': 'http://www.example.org/index.html',
}
def get_headers(url):
resp = requests.head(url)
print "Status: %d" % resp.status_code
resp.raise_for_status()
for k,v in resp.headers.items():
print '%-16s : %s' % (k, v)
def get_url_mtime(url):
''' Get last modified time of an online file from the headers
and convert to a timestamp
'''
resp = requests.head(url)
resp.raise_for_status()
t = email.utils.parsedate_tz(resp.headers['last-modified'])
return email.utils.mktime_tz(t)
def download(url, fname):
''' Download url to fname, setting mtime of file to match url '''
print >>sys.stderr, "Downloading '%s' to '%s'" % (url, fname)
resp = requests.get(url)
#print "Status: %d" % resp.status_code
resp.raise_for_status()
t = email.utils.parsedate_tz(resp.headers['last-modified'])
timestamp = email.utils.mktime_tz(t)
#print 'last-modified', timestamp
with open(fname, 'wb') as f:
f.write(resp.content)
os.utime(fname, (timestamp, timestamp))
def open_cached(basename, mode='r', age=0):
''' Open a cached file.
Download it if it doesn't yet exist in cache, or if the online
version is more than age seconds newer than the cached version.'''
fname = os.path.join(cache_path, basename)
url = file_urls[basename]
#print fname, url
if os.path.exists(fname):
#Check if online version is sufficiently newer than offline version
file_mtime = os.path.getmtime(fname)
url_mtime = get_url_mtime(url)
if url_mtime > age + file_mtime:
download(url, fname)
else:
download(url, fname)
return open(fname, mode)
def main():
for fname in ('example1.html', 'badfile', 'example2.html'):
print fname
try:
with open_cached(fname, 'r') as f:
for i, line in enumerate(f, 1):
print "%3d: %s" % (i, line.rstrip())
except requests.exceptions.HTTPError, e:
print >>sys.stderr, "%s '%s' = '%s'" % (e, file_urls[fname], fname)
print
if __name__ == "__main__":
main()
Of course, for real-world use you should add some proper error checking.
You may notice that I've defined a function get_headers(url) which never gets called; I used it during development & figured it might come in handy when expanding this program, so I left it in. :)

Categories