The problem: the string specified is not being found in the text file, why?
Description: I've got a simple Python script here that checks to see if a file exists, if it does, check the integrity, if it passes, stop. If it fails, recreate the file. If the file doesn't exist make it.
I've got everything working but the integrity check. The integrity check right now is simply looking for a a string called "[driveC]", I'd like to make it more thorough but this is what I've got going so far.
Any thoughts? A work around is to convert the config file into a list variable and search through the list for the string. But I'd like to use this method as it seems scalable.
My code: (also can be seen here https://hastebin.com/umitibigib.py) line 55 is the check that is failing
###io testing
import os.path
try:
from configparser import ConfigParser
except ImportError:
from ConfigParser import ConfigParser # ver. < 3.0
#variables
drives_given = [ 'C', 'D']
# instantiate config parser
config = ConfigParser()
cfg_path = os.path.exists('smartPyConfig.ini')
#A config file was not found, let's make one
def create_config_file():
cfgfile = open("smartPyConfig.ini",'w')
print("A new config file was created")
print("")
print("Adding thresholds and drive sections")
#Add general settings
config.add_section('general')
config.set('general', 'logging_level', 'debug')
#Add smartctl threshold values
config.add_section('standard_thresholds')
config.set('standard_thresholds', 'threshold_value_raw_read_error_rate_norm', '101')
config.set('standard_thresholds', 'threshold_value_reallocated_sector_count_norm', '105')
config.set('standard_thresholds', 'threshold_value_seek_error_rate_norm', '101')
config.set('standard_thresholds', 'threshold_value_power_on_hours_raw', '1000')
config.set('standard_thresholds', 'threshold_value_temperature_celsius_raw', '100')
config.set('standard_thresholds', 'threshold_value_reported_uncorrect_raw', '100')
config.set('standard_thresholds', 'threshold_value_hardware_ecc_recovered_norm', '100')
config.set('standard_thresholds', 'threshold_value_offline_uncorrectable_raw', '100')
config.set('standard_thresholds', 'threshold_value_free_fall_sensor_raw', '100')
config.set('standard_thresholds', 'threshold_value_udma_crc_error_count_norm', '350')
#DONE
#Create a section for each drive we were given
#for every drive letter listed in the drives_given list, make a section for it
for i in drives_given:
config.add_section('drive%s' % i)
#Write out the data and close the file
config.write(cfgfile)
cfgfile.close()
print("Config file created and written to disk.")
#Check to see if file is healthy, if not recreate it.
def check_file_integrity():
with open("smartPyConfig.ini", 'r') as file:
if "[driveC]" in file: #Not working
print("found drive C in config file.")
print("finished")
else:
print("drive C not found in config file.")
create_config_file()
#check for a config file
def check_for_config():
# Check to see if the file exists
try:
if cfg_path: #if cfg_path is true (true = the file was found) do this
print("Config file found!")
print("Checking config file..")
check_file_integrity()
else: #if cfg_path is not true, file was not found, do this
print("Config file not found")
print("Creating config file.")
create_config_file()
except Exception as e:
print("An exception occured, printing exception")
print(e)
check_for_config()
The config file it's checking:
[general]
logging_level = debug
[standard_thresholds]
threshold_value_raw_read_error_rate_norm = 101
threshold_value_reallocated_sector_count_norm = 105
threshold_value_seek_error_rate_norm = 101
threshold_value_power_on_hours_raw = 1000
threshold_value_temperature_celsius_raw = 100
threshold_value_reported_uncorrect_raw = 100
threshold_value_hardware_ecc_recovered_norm = 100
threshold_value_offline_uncorrectable_raw = 100
threshold_value_free_fall_sensor_raw = 100
threshold_value_udma_crc_error_count_norm = 350
[driveC]
[driveD]
Your variable file is the file, not the contents of the file. You may want something like:
if "[driveC]" in file.read():
... which tests to see if that string is in the contents of the file.
What you originally had checks for an exact match on some line of the file, since the in operator will iterate over the file's lines. This didn't work because each line ends with a newline character, which you did not include in your target string. Like this:
if "[driveC]\n" in file:
If you need it to match exactly that text on a single line (with not even any whitespace on the same line), that will work. As a bonus, it will stop as soon as it finds the match instead of reading the whole file (although for smallish files, reading the whole file is probably just as fast or faster).
Related
I am building a python script to basically edit lots of files by means of searching and replacing words in the file.
There is an original file named: C:\python 3.5/remedy line 1.ahk
There is a file containing the words I want to replace (search words) in the original document and a text file that has the list of the new words that I would like to be placed into the final document.
The script then runs and works perfect. The final document is then created and named based on a line in the final text file (code begins on line 72). A way so I can tell what the final product is by looking at it. This file is originally named output = open("C:\python 3.5\output.ahk", 'w') and later in the script it is renamed based on line 37 in the script. All that works fine.
So the seemingly simple part left that I can't seem to figure out is how to take this one file and move it to a directory where it belongs. That directory is created based on the same line in that the file gets its name from (code starts on line 82). How do I simply move my file into a directory that has been created by the script, i.e. based on a variable (code starts on line 84 for this) so the name of the file is based on a variable.
import shutil
#below is where your modified file sits, before we move it into it's own directory named dst, based on a variable #mainnewdir
srcdir = r'C:\python 3.5/'+(justfilename)
dst = (mainnewdir)+(justfilename)
shutil.copyfile(src, dst)
Why does it format it with extra \ in the code?
Why does it seem to not give me a error if I use a / vs. a \ slash?
Here is the entire code, like I said only the last part of moving the file does not work:
import os
import linecache
import sys
import string
import re
## information/replacingvalues.txt this is the text of the values you want in your final document
#information = open("C:\python 3.5\replacingvalues.txt", 'r')
information = open("C:\python 3.5/replacingvalues.txt", 'r')
# information = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\information/replacingvalues.txt",
# Text_Find_and_Replace\Result/output.txt This is the dir and the sum or final document
# output = open("C:\python 3.5\output.ahk", 'w')
createblank = open ("C:\python 3.5/output.ahk", 'w')
createblank.close()
output = open("C:\python 3.5\output.ahk", 'w')
# field = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Field/values.txt"
# Field is the file or words you will be replacing
field = open("C:\python 3.5/values.txt", 'r')
# modified code for autohot key
# Text_Find_and_Replace\Test/remedy line 1.ahk is the original doc you want modified
with open("C:\python 3.5/remedy line 1.ahk", 'r') as myfile:
inline = myfile.read()
## remedy line 1.ahk
informations = []
fields = []
dictionary = {}
i = 0
for line in information:
informations.append(line.splitlines())
for lines in field:
fields.append(lines.split())
i = i + 1;
if (len(fields) != len(informations)):
print("replacing values and values have different numbers")
exit();
else:
for i in range(0, i):
rightvalue = str(informations[i])
rightvalue = rightvalue.strip('[]')
rightvalue = rightvalue[1:-1]
leftvalue = str(fields[i])
leftvalue = leftvalue.strip('[]')
leftvalue = leftvalue.strip("'")
dictionary[leftvalue] = rightvalue
robj = re.compile('|'.join(dictionary.keys()))
result = robj.sub(lambda m: dictionary[m.group(0)], inline)
output.write(result)
information.close;
output.close;
field.close;
output.close()
import os
import linecache
linecache.clearcache()
newfilename= linecache.getline("C:\python 3.5/remedy line 1.txt",37)
filename = ("C:\python 3.5/output.ahk")
os.rename(filename, newfilename.strip())
#os.rename(filename, newfilename.strip()+".ahk")
linecache.clearcache()
############## below will create a new directory based on the the word or words in line 37 of the txt file.
newdirname= linecache.getline("C:\python 3.5/remedy line 1.txt",37)
#newpath = r'C:\pythontest\automadedir'
#below removes the /n ie new line raw assci
justfilename = (newdirname).strip()
#below removes the .txt from the rest of the justfilename..
autocreateddir = (justfilename).strip(".txt")
# below is an example of combining a string and a variable
# below makes the variable up that will be the name of the new directory based on reading line 37 of a text file above
mainnewdir= r'C:\pythontest\automadedir/'+(autocreateddir)
if not os.path.exists(mainnewdir):
os.makedirs(mainnewdir)
linecache.clearcache()
# ####################################################
#below is where your modified file sits, before we move it into it's own directory named dst, based on a variable #mainnewdir
srcdir = r'C:\python 3.5/'+(justfilename)
dst = (mainnewdir)+(justfilename)
shutil.copyfile(src, dst)
backslashes do not have a mind of their own.
When you paste windows paths as-is and they contain \n, r, \b, \x, \v, \U (python 3), (refer to table here for all of them), you're just using escape sequences without noticing it.
When the escape sequence doesn't exist (ex \p) it works. But when it's known the filenames are often invalid. Which explains the apparent randomness of the issue.
To be able to safely paste windows paths without changing/escaping them, just use the raw prefix:
my_file = r"C:\temp\foo.txt"
so the backslashes won't be interpreted. One exception though: if string ends with backslash you still have to double it.
I'm currently writing an open source library for a container format, which involves modifying zip archives. Therefore I utilized pythons build-in zipfile module. Due to some limitations I decided to modify the module and ship it with my library. These modifications include a patch for removing entries from the zip file from the python issue tracker: https://bugs.python.org/issue6818
To be more specific I included the zipfile.remove.2.patch from ubershmekel.
After some modifications for Python-2.7 the patch works just fine according to the shipped unit-tests.
But nevertheless I'm running into some problems, when removing, adding and removing + adding files without closing the zipfile in between.
Error
Traceback (most recent call last):
File "/home/martin/git/pyCombineArchive/tests/test_zipfile.py", line 1590, in test_delete_add_no_close
self.assertEqual(zf.read(fname), data)
File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 948, in read
with self.open(name, "r", pwd) as fp:
File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 1003, in open
% (zinfo.orig_filename, fname))
BadZipFile: File name in directory 'foo.txt' and header 'bar.txt' differ.
Meaning the zip file is ok, but somehow the central dictionary/entry header gets messed up.
This unittest reproduces this error:
def test_delete_add_no_close(self):
fname_list = ["foo.txt", "bar.txt", "blu.bla", "sup.bro", "rollah"]
data_list = [''.join([chr(randint(0, 255)) for i in range(100)]) for i in range(len(fname_list))]
# add some files to the zip
with zipfile.ZipFile(TESTFN, "w") as zf:
for fname, data in zip(fname_list, data_list):
zf.writestr(fname, data)
for no in range(0, 2):
with zipfile.ZipFile(TESTFN, "a") as zf:
zf.remove(fname_list[no])
zf.writestr(fname_list[no], data_list[no])
zf.remove(fname_list[no+1])
zf.writestr(fname_list[no+1], data_list[no+1])
# try to access prior deleted/added file and prior last file (which got moved, while delete)
for fname, data in zip(fname_list, data_list):
self.assertEqual(zf.read(fname), data)
My modified zipfile module and the complete unittest file can be found in this gist: https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4
After some intensive debugging, I'm quite sure something went wrong with moving the remaining chunks. (The ones stored after the removed file) So I went ahead and rewrote this code part, so it copies these files/chunks each at a time. Also I rewrite the file header for each of them (to make sure it is valid) and the central directory at the end of the zipfile.
My remove function now looks like this:
def remove(self, member):
"""Remove a file from the archive. Only works if the ZipFile was opened
with mode 'a'."""
if "a" not in self.mode:
raise RuntimeError('remove() requires mode "a"')
if not self.fp:
raise RuntimeError(
"Attempt to modify ZIP archive that was already closed")
fp = self.fp
# Make sure we have an info object
if isinstance(member, ZipInfo):
# 'member' is already an info object
zinfo = member
else:
# Get info object for member
zinfo = self.getinfo(member)
# start at the pos of the first member (smallest offset)
position = min([info.header_offset for info in self.filelist]) # start at the beginning of first file
for info in self.filelist:
fileheader = info.FileHeader()
# is member after delete one?
if info.header_offset > zinfo.header_offset and info != zinfo:
# rewrite FileHeader and copy compressed data
# Skip the file header:
fp.seek(info.header_offset)
fheader = fp.read(sizeFileHeader)
if fheader[0:4] != stringFileHeader:
raise BadZipFile("Bad magic number for file header")
fheader = struct.unpack(structFileHeader, fheader)
fname = fp.read(fheader[_FH_FILENAME_LENGTH])
if fheader[_FH_EXTRA_FIELD_LENGTH]:
fp.read(fheader[_FH_EXTRA_FIELD_LENGTH])
if zinfo.flag_bits & 0x800:
# UTF-8 filename
fname_str = fname.decode("utf-8")
else:
fname_str = fname.decode("cp437")
if fname_str != info.orig_filename:
if not self._filePassed:
fp.close()
raise BadZipFile(
'File name in directory %r and header %r differ.'
% (zinfo.orig_filename, fname))
# read the actual data
data = fp.read(fheader[_FH_COMPRESSED_SIZE])
# modify info obj
info.header_offset = position
# jump to new position
fp.seek(info.header_offset, 0)
# write fileheader and data
fp.write(fileheader)
fp.write(data)
if zinfo.flag_bits & _FHF_HAS_DATA_DESCRIPTOR:
# Write CRC and file sizes after the file data
fp.write(struct.pack("<LLL", info.CRC, info.compress_size,
info.file_size))
# update position
fp.flush()
position = fp.tell()
elif info != zinfo:
# move to next position
position = position + info.compress_size + len(fileheader) + self._get_data_descriptor_size(info)
# Fix class members with state
self.start_dir = position
self._didModify = True
self.filelist.remove(zinfo)
del self.NameToInfo[zinfo.filename]
# write new central directory (includes truncate)
fp.seek(position, 0)
self._write_central_dir()
fp.seek(self.start_dir, 0) # jump to the beginning of the central directory, so it gets overridden at close()
You can find the complete code in the latest revision of the gist: https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4
or in the repo of the library I'm writing: https://github.com/FreakyBytes/pyCombineArchive
The goal is to read a log file in real time line by line (standard generator stuff) but the catch is, the file name changes at various intervals. The name change can't be helped (application dictated appended with a time string) and the name is changed when the log file size reaches ~2MB (guesstimate).
My approach was to create a file getter function that got the file (or new file) and then passed that to the generator. I thought that when the file changed names I would get a 'File not found' error, but what my test showed, is that the file name change is prevented entirely as 'another program is using this file'. The name change must be allowed, and this reader code cannot interfere with the application logging process at all.
import os
import time
import fnmatch
directory = '\\foo\\'
def fileGenerator(logFile):
""" Run a line generator """
logFile.seek(0,2)
while True:
line = logFile.readline()
if not line:
time.sleep(0.1)
continue
yield line
def fileGetter():
""" Get the Logging File """
matchedFiles = []
for afile in os.listdir(directory):
if fnmatch.fnmatch(afile,'amc_*.txt'):
matchedFiles.append(afile)
if len(matchedFiles)==1:
#There was exactly one matching file found send it to the generator
return os.path.join(directory,matchedFiles[0])
else:
#There either wasn't a file found or many matching
#Error out and stop process... critical error
if __name__ == '__main__':
filePath = fileGetter()
try:
logFile = open(filePath,"r")
except Exception as e:
#Catch the file not found and go back to the file path getter
#Send the file back to the generator
print e
if logFile:
loglines = fileGenerator(logFile)
for line in loglines:
#handle the line
print line,
If you can't hold the file open while waiting for new content to be written to it, I suggest saving the file position you were last at and closing the file before you sleep, and then reopening the file and seeking to that point afterwards. You could also investigate filesystem notification systems if you care about spotting file additions or renames immediately.
def log_reader():
filename = "does_not_exist"
filepos = 0
while True:
try:
file = open(filename)
except FileNotFoundError:
filename = fileGetter()
# if renamed files start empty, set filepos to zero here!
continue
file.seek(filepos)
while True:
line = file.readline()
if not line:
filepos = file.tell()
file.close()
sleep(0.1) # you may want to test different sleep lengths to avoid FS thrash
break
yield line
The opening and closing of the file may stress out your filesystem if you do it too much, so I'd suggest sleeping longer than your previous code did (but you may want to test to see how well your OS handles it if you care about how responsive your log reader is).
Currently I have this piece of code for python 2.7:
h = 0
for line in fileinput.input('HISTORY',inplace=1):
if line[0:2] == x:
h = h + 1
if h in AU:
line = line.replace(x,'AU')
if 'timestep' in line:
h = 0
sys.stdout.write(('\r%s%% ') % format(((os.stat('HISTORY').st_size / os.stat('HISTORY.bak').st_size)*100),'.1f'))
sys.stdout.write(line)
What I am having trouble with is the following line:
sys.stdout.write(('\r%s%% ') % format(((os.stat('HISTORY').st_size / os.stat('HISTORY.bak').st_size)*100),'.1f'))
I need this information to be outputted to the console ONLY and not into the HISTORY file.
This code creates a temporary copy of the input file, then scans this and rewrites the original file. It handles errors during processing the file so that the original data isn't lost during the re-write. It demonstrates how to write some data to stdout occasionally and other data back to the original file.
The temporary file creation was taken from this SO answer.
import fileinput
import os, shutil, tempfile
# create a copy of the source file into a system specified
# temporary directory. You could just put this in the original
# folder, if you wanted
def create_temp_copy(src_filename):
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, 'temp-history.txt')
shutil.copy2(src_filename,temp_path)
return temp_path
# create a temporary copy of the input file
temp = create_temp_copy('HISTORY.txt')
# open up the input file for writing
dst = open('HISTORY.txt','w+')
for line in fileinput.input(temp):
# Added a try/catch to handle errors during processing.
# If this isn't present, any exceptions that are raised
# during processing could cause unrecoverable loss of
# the HISTORY file
try:
# some sort of replacement
if line.startswith('e'):
line = line.strip() + '#\n' # notice the newline here
# occasional status updates to stdout
if '0' in line:
print 'info:',line.strip() # notice the removal of the newline
except:
# when a problem occurs, just output a message
print 'Error processing input file'
finally:
# re-write the original input file
# even if there are exceptions
dst.write(line)
# deletes the temporary file
os.remove(temp)
# close the original file
dst.close()
If you only want the information to go to the console could you just use print instead?
I'm using this script to connect to sample ftp server and list available directories:
from ftplib import FTP
ftp = FTP('ftp.cwi.nl') # connect to host, default port (some example server, i'll use other one)
ftp.login() # user anonymous, passwd anonymous#
ftp.retrlines('LIST') # list directory contents
ftp.quit()
How do I use ftp.retrlines('LIST') output to check if directory (for example public_html) exists, if it exists cd to it and then execute some other code and exit; if not execute code right away and exit?
Nslt will list an array for all files in ftp server. Just check if your folder name is there.
from ftplib import FTP
ftp = FTP('yourserver')
ftp.login('username', 'password')
folderName = 'yourFolderName'
if folderName in ftp.nlst():
#do needed task
you can use a list. example
import ftplib
server="localhost"
user="user"
password="test#email.com"
try:
ftp = ftplib.FTP(server)
ftp.login(user,password)
except Exception,e:
print e
else:
filelist = [] #to store all files
ftp.retrlines('LIST',filelist.append) # append to list
f=0
for f in filelist:
if "public_html" in f:
#do something
f=1
if f==0:
print "No public_html"
#do your processing here
You can send "MLST path" over the control connection.
That will return a line including the type of the path (notice 'type=dir' down here):
250-Listing "/home/user":
modify=20131113091701;perm=el;size=4096;type=dir;unique=813gc0004; /
250 End MLST.
Translated into python that should be something along these lines:
import ftplib
ftp = ftplib.FTP()
ftp.connect('ftp.somedomain.com', 21)
ftp.login()
resp = ftp.sendcmd('MLST pathname')
if 'type=dir;' in resp:
# it should be a directory
pass
Of course the code above is not 100% reliable and would need a 'real' parser.
You can look at the implementation of MLSD command in ftplib.py which is very similar (MLSD differs from MLST in that the response in sent over the data connection but the format of the lines being transmitted is the same):
http://hg.python.org/cpython/file/8af2dc11464f/Lib/ftplib.py#l577
The examples attached to ghostdog74's answer have a bit of a bug: the list you get back is the whole line of the response, so you get something like
drwxrwxrwx 4 5063 5063 4096 Sep 13 20:00 resized
This means if your directory name is something like '50' (which is was in my case), you'll get a false positive. I modified the code to handle this:
def directory_exists_here(self, directory_name):
filelist = []
self.ftp.retrlines('LIST',filelist.append)
for f in filelist:
if f.split()[-1] == directory_name:
return True
return False
N.B., this is inside an FTP wrapper class I wrote and self.ftp is the actual FTP connection.
Tom is correct, but no one voted him up
however for the satisfaction who voted up ghostdog74 I will mix and write this code, works for me, should work for you guys.
import ftplib
server="localhost"
user="user"
uploadToDir="public_html"
password="test#email.com"
try:
ftp = ftplib.FTP(server)
ftp.login(user,password)
except Exception,e:
print e
else:
filelist = [] #to store all files
ftp.retrlines('NLST',filelist.append) # append to list
num=0
for f in filelist:
if f.split()[-1] == uploadToDir:
#do something
num=1
if num==0:
print "No public_html"
#do your processing here
first of all if you follow ghost dog method, even if you say directory "public" in f, even when it doesnt exist it will evaluate to true because the word public exist in "public_html" so thats where Tom if condition can be used
so I changed it to if f.split()[-1] == uploadToDir:.
Also if you enter a directory name somethig that doesnt exist but some files and folder exist the second by ghostdog74 will never execute because its never 0 as overridden by f in for loop so I used num variable instead of f and voila the goodness follows...
Vinay and Jonathon are right about what they commented.
In 3.x nlst() method is deprecated. Use this code:
import ftplib
remote = ftplib.FTP('example.com')
remote.login()
if 'foo' in [name for name, data in list(remote.mlsd())]:
# do your stuff
The list() call is needed because mlsd() returns a generator and they do not support checking what is in them (do not have __contains__() method).
You can wrap [name for name, data in list(remote.mlsd())] list comp in a function of method and call it when you will need to just check if a directory (or file) exists.
=> I found this web-page while googling for a way to check if a file exists using ftplib in python. The following is what I figured out (hope it helps someone):
=> When trying to list non-existent files/directories, ftplib raises an exception. Even though Adding a try/except block is a standard practice and a good idea, I would prefer my FTP scripts to download file(s) only after making sure they exist. This helps in keeping my scripts simpler - at least when listing a directory on the FTP server is possible.
For example, the Edgar FTP server has multiple files that are stored under the directory /edgar/daily-index/. Each file is named liked "master.YYYYMMDD.idx". There is no guarantee that a file will exist for every date (YYYYMMDD) - there is no file dated 24th Nov 2013, but there is a file dated: 22th Nov 2013. How does listing work in these two cases?
# Code
from __future__ import print_function
import ftplib
ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL#gmail.com")
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131122.idx")
print(resp)
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
print(resp)
# Output
250-Start of list for /edgar/daily-index/master.20131122.idx
modify=20131123030124;perm=adfr;size=301580;type=file;unique=11UAEAA398;
UNIX.group=1;UNIX.mode=0644;UNIX.owner=1019;
/edgar/daily-index/master.20131122.idx
250 End of list
Traceback (most recent call last):
File "", line 10, in <module>
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
File "lib/python2.7/ftplib.py", line 244, in sendcmd
return self.getresp()
File "lib/python2.7/ftplib.py", line 219, in getresp
raise error_perm, resp
ftplib.error_perm: 550 '/edgar/daily-index/master.20131124.idx' cannot be listed
As expected, listing a non-existent file generates an exception.
=> Since I know that the Edgar FTP server will surely have the directory /edgar/daily-index/, my script can do the following to avoid raising exceptions due to non-existent files:
a) list this directory.
b) download the required file(s) if they are are present in this listing - To check the listing I typically perform a regexp search, on the list of strings that the listing operation returns.
For example this script tries to download files for the past three days. If a file is found for a certain date then it is downloaded, else nothing happens.
import ftplib
import re
from datetime import date, timedelta
ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL#gmail.com")
listing = []
# List the directory and store each directory entry as a string in an array
ftp_client.retrlines("LIST /edgar/daily-index", listing.append)
# go back 1,2 and 3 days
for diff in [1,2,3]:
today = (date.today() - timedelta(days=diff)).strftime("%Y%m%d")
month = (date.today() - timedelta(days=diff)).strftime("%Y_%m")
# the absolute path of the file we want to download - if it indeed exists
file_path = "/edgar/daily-index/master.%(date)s.idx" % { "date": today }
# create a regex to match the file's name
pattern = re.compile("master.%(date)s.idx" % { "date": today })
# filter out elements from the listing that match the pattern
found = filter(lambda x: re.search(pattern, x) != None, listing)
if( len(found) > 0 ):
ftp_client.retrbinary(
"RETR %(file_path)s" % { "file_path": file_path },
open(
'./edgar/daily-index/%(month)s/master.%(date)s.idx' % {
"date": today
}, 'wb'
).write
)
=> Interestingly, there are situations where we cannot list a directory on the FTP server. The edgar FTP server, for example, disallows listing on /edgar/data because it contains far too many sub-directories. In such cases, I wouldn't be able to use the "List and check for existence" approach described here - in these cases I would have to use exception handling in my downloader script to recover from non-existent file/directory access attempts.
from ftplib import FTP
ftp = FTP()
ftp.connect(hostname, 21)
ftp.login(username,password)
try:
ftp.cwd('your folder name')
#do the code for successfull cd
except Exception:
#do the code for folder not exists