Optimize the performance of retreiving file sizes with pysftp - python

I have a requirement to get the file details for certain locations (within the system and SFTP) and get the file size for some locations on SFTP which can be achieved using the shared code.
def getFileDetails(location: str):
filenames: list = []
if location.find(":") != -1:
for file in glob.glob(location):
with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword) as sftp:
remote_files = [x.filename for x in sorted(sftp.listdir_attr(location), key=lambda f: f.st_mtime)]
if location == LOCATION_SFTP_A:
for filename in remote_files:
sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
elif location == LOCATION_SFTP_B:
for filename in remote_files:
sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
for filename in remote_files:
return filenames
There are more than 10000+ files in LOCATION_SFTP_A and LOCATION_SFTP_B. For each file, I need to get the file size. To get the size I am using
sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
# Time Taken : 5 min+
sftp_archive_d_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
# Time Taken : 20-30 s
If I comment sftp.stat(location + "/" + filename).st_size and assign static value It takes only 20-30 seconds to run the entire code. I am looking for a way How can optimize the time and get the file size details.

The Connection.listdir_attr already gives you the file size in SFTPAttributes.st_size.
There's no need to call Connection.stat for each file to get the size (again).
Error while trying to generate xml file with specific name

I tried to generate a file named, for example, 23-10-2022|21-03-11.xml or if the user enters his own name userGeneratedName23-10-2022-21-03-11.xml. I don't know why when I tried to specify a particular folder where the generated file should be saved the program throws me an Invalid argument error. I suspect that I am using join incorrectly, I don't really know how to correct it
if not os.path.exists("Generated XMLs"):
os.makedirs("Generated XMLs")
#open file
today = date.today()
now = datetime.now()
#if filename is not specified, create file with today's date and time of creation
if filename == "":
filename = today.strftime("%d-%m-%Y") +"|"+ now.strftime("%H-%M-%S")
#if filename is specified,ad at the and of filename today's date and time of creation
filename = filename + today.strftime("%d-%m-%Y")+"|"+ now.strftime("%H-%M-%S")
# open file wchich is in Generated XMLs folder and name it with variable filename
file = open(os.path.join("Generated XMLs", filename + ".xml"), "w")
#write to file
#write header
file.write("\n<root>\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>")
#write data
# ask user if he wont netto or brutto meters and ad flag
flag = input("Enter 'n' if you want to use netto meters or 'b' if you want to use brutto meters: ")
for i in range(len(convertedData)):
file.write("\t\t\t<LP>" + (i+1) + "</LP>\n")
file.write("\t\t\t\t<KOD>" + convertedData[i][0] + " " + convertedData[i][1] + "</KOD>\n")
file.write("\t\t\t\t<NAZWA>" + convertedData[i][0] + " " + convertedData[i][1] + "</NAZWA>\n")
file.write("\t\t\t\t<MPP>" + "0" + "</MPP>\n")
file.write("\t\t\t\t<STAWKA>" + "23.00" + "</STAWKA>\n")
file.write("\t\t\t\t<FLAGA>" + "2" + "</FLAGA>\n")
file.write("\t\t\t\t<ZRODLOWA>" + "0.00" + "</ZRODLOWA>\n")
if flag == "n":
file.write("\t\t\t<ILOSC>" + convertedData[i][3] + "00" + "</ilosc>\n")
elif flag == "b":
file.write("\t\t\t<ILOSC>" + convertedData[i][2] + "00" + "</ilosc>\n")
print("Error: Wrong flag. Enter 'n' or 'b'.")
file.write("\t\t\t<JB>" + convertedData[i][4] + "</JB>\n")
file.write("\t\t\t<JMZ>" + convertedData[i][4] + "</JMZ>\n")
#write footer
#close file
The exact error I get:
Traceback (most recent call last):
File "C:\Users\reczul\PycharmProjects\pythonProject5\main.py", line 23, in <module>
File "C:\Users\reczul\PycharmProjects\pythonProject5\main.py", line 13, in main
File "C:\Users\reczul\PycharmProjects\pythonProject5\venv\Functions\XMLCreator.py", line 20, in CreateXML
file = open(os.path.join("Generated XMLs", filename + ".xml"), "w")
OSError: [Errno 22] Invalid argument: 'Generated XMLs\\24-10-2022|09-23-45.xml'
Process finished with exit code 1
I can not use "|" also I had to convert filename to str.
if not os.path.exists("Generated XMLs"):
os.makedirs("Generated XMLs")
#open file
today = date.today()
now = datetime.now()
#if filename is not specified, create file with today's date and time of creation
if filename == "":
filename = today.strftime("%d-%m-%Y") +"--"+ now.strftime("%H-%M-%S")
#if filename is specified,ad at the and of filename today's date and time of creation
filename = filename + today.strftime("%d-%m-%Y")+"--"+ now.strftime("%H-%M-%S")
filename = str(filename)
# open file wchich is in Generated XMLs folder and name it with variable filename
#file = open(os.path.join(os.path.dirname(os.path.abspath(__file__)),"Generated XMLs",filename + ".xml"), "w")
#cerate XML file which is in Generated XMLs folder
file = open(os.path.join("Generated XMLs", filename + ".xml"), "w")

Python - Path/Folder/File creation

I am running the following block of code to create the path to a new file:
# Opens/create the file that will be created
device_name = target_device["host"].split('.')
path = "/home/user/test_scripts/configs/" + device_name[-1] + "/"
# Check if path exists
if not os.path.exists(path):
# file = open(time_now + "_" + target_device["host"] + "_config.txt", "w")
file = open(path + time_now + "_" + device_name[0] + "_config.txt", "w")
# Time Stamp File
file.write('\n Create on ' + now.strftime("%Y-%m-%d") +
' at ' + now.strftime("%H:%M:%S") + ' GMT\n')
# Writes output to file
# Close file
The code run as intended with the exception that it creates and saves the files on the directory: /home/user/test_scripts/configs/ instead on the indented one that should be: /home/user/test_scripts/configs/device_name[-1]/.
Please advise.
Try using os.path.join(base_path, new_path) [Reference] instead of string concatenation. For example:
path = os.path.join("/home/user/test_scripts/configs/", device_name[-1])
os.makedirs(path, exist_ok=True)
new_name = time_now + "_" + device_name[0] + "_config.txt"
with open(os.path.join(path, new_name), "w+") as file:
Although I don't get why you're creating a directory with device_name[-1] and as a file name using device_name[0].

How to decompress zip files across a Windows folder in Python

I have a large folder having 900+ sub-folders, each of which has another folder inside it which in turn has a zipped file.
Its like -
How can I decompress all the zipped files in their respective folder OR in a separate folder elsewhere in Windows using Python?
Any help would be great!!
You could try something like:
import zipfile,os;
def unzip(source_filename, dest_dir):
with zipfile.ZipFile(source_filename) as zf:
for member in zf.infolist():
extract_allowed = True;
path = dest_dir;
words = member.filename.split('/');
for word in words:
if (word == '..'):
extract_allowed = False;
if (extract_allowed == True):
zf.extract(member, dest_dir);
def unzipFiles(dest_dir):
for file in os.listdir(dest_dir):
if (os.path.isdir(dest_dir + '/' + file)):
return unzipFiles(dest_dir + '/' + file);
if file.endswith(".zip"):
print 'Found file: "' + file + '" in "' + dest_dir + '" - extracting';
unzip(dest_dir + '/' + file, dest_dir + '/');

Python tarfile gzipped file bigger than sum of source files

I have a Python routine which archives file recordings into a GZipped tarball. The output file appears to be far larger than the source files, and I cannot work out why. As an example of the scale of the issue, 6GB of call recordings are generating an archive of 10GB.
There appear to be no errors in the script and the output .gz file is readable and appears OK apart from the huge size.
Excerpt from my script as follows:
# construct tar filename and open file
client_fileid = client_id + "_" + dt.datetime.now().strftime("%Y%m%d_%H%M%S")
tarname = tar_path + "/" + client_fileid + ".tar.gz"
print "Opening tar file %s " % (tarname), "\n"
tar = tarfile.open (tarname, "w:gz")
print "Error opening tar file: %s" % sys.exc_info()[0]
sql="""SELECT number, er.id, e.id, flow, filename, filesize, unread, er.cr_date, callerid,
length, callid, info, party FROM extension_recording er, extension e, client c
WHERE er.extension_id = e.id AND e.client_id = c.id AND c.parent_client_id = %s
AND DATE(er.cr_date) BETWEEN '%s' AND '%s'""" % (client_id, start_date, end_date)
rows = cur.execute(sql)
recordings = cur.fetchall()
if rows == 0: sys.exit("No recordings for selected date range - exiting")
for recording in recordings: # loop through recordings cursor
ext_len = len(str(recording[0]))
# add preceding zeroes if the ext no starts with 0 or 00
if ext_len == 2: extension_no = "0" + str(recording[0])
elif ext_len == 1: extension_no = "00" + str(recording[0])
else: extension_no = str(recording[0])
filename = recording[4]
extended_no = client_id + "*%s" % (extension_no)
sourcedir = recording_path + "/" + extended_no
tardir = extended_no + "/" + filename
complete_name = sourcedir + "/" + filename
tar.add(complete_name, arcname=tardir) # add to tar archive
print "Error '%s' writing to tar file %s" % (sys.exc_info()[1], csvfullfilename)

Copy a file to a new location and increment filename, python

I am trying to:
Loop through a bunch of files
makes some changes
Copy the old file to a sub directory. Here's the kicker I don't want to overwrite the file in the new directory if it already exists. (e.g. if "Filename.mxd" already exists, then copy and rename to "Filename_1.mxd". If "Filename_1.mxd" exists, then copy the file as "Filename_2.mxd" and so on...)
save the file (but do a save, not a save as so that it overwrites the existing file)
it goes something like this:
for filename in glob.glob(os.path.join(folderPath, "*.mxd")):
fullpath = os.path.join(folderPath, filename)
mxd = arcpy.mapping.MapDocument(filename)
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
# Make some changes to my file here
# Copy the in memory file to a new location. If the file name already exists, then rename the file with the next instance of i (e.g. filename + "_" + i)
for i in range(50):
if i > 0:
print "Test1"
if arcpy.Exists(draftloc + "\\" + filename2) or arcpy.Exists(draftloc + "\\" + shortname + "_" + str(i) + extension):
print "Test2"
print "Test3"
arcpy.Copy_management(filename2, draftloc + "\\" + shortname + "_" + str(i) + extension)
So, 2 things I decided to do, was to just set the range of files well beyond what I expect to ever occur (50). I'm sure there's a better way of doing this, by just incrementing to the next number without setting a range.
The second thing, as you may see, is that the script saves everything in the range. I just want to save it once on the next instance of i that does not occur.
Hope this makes sense,
Use a while loop instead of a for loop. Use the while loop to find the appropriate i, and then save afterwards.
The code/pseudocode would look like:
result_name = original name
i = 0
while arcpy.Exists(result_name):
result_name = draftloc + "\\" + shortname + "_" + str(i) + extension
save as result_name
This should fix both issues.
thanks to Maty suggestion above, I've come up with my answer. For those who are interested, my code is:
result_name = filename2
print result_name
i = 0
# Check if file exists
if arcpy.Exists(draftloc + "\\" + result_name):
# If it does, increment i by 1
# While each successive filename (including i) does not exists, then save the next filename
while not arcpy.Exists(draftloc + "\\" + shortname + "_" + str(i) + extension):
mxd.saveACopy(draftloc + "\\" + shortname + "_" + str(i) + extension)
# else if the original file didn't satisfy the if, the save it.
mxd.saveACopy(draftloc + "\\" + result_name)
