right now this is all I have:
import ftputil
a_host = ftputil.FTPHost("ftp_host", "username","pass") # login to ftp
for (dirname, subdirs, files) in a_host.walk("/"): # directory
for f in files:
fullpath = a_host.path.join(dirname, f)
if fullpath.endswith('html'):
#stucked
so I can log in to my ftp, and do a .walk in my files
the thing I am not able to manage is when the .walk finds a html file to also search in it for a string I want.
for example:
on my ftp - there is a index.html and a something.txt file
I want to find with .walk the index.html file, and then in index.html search for 'my string'
thanks
FTP is a protocol for file transfer only. It has not the ability by itself to execute remote commands which are needed to search the files on the remote server (there is a SITE command but it can usually not be used for such a purpose because it is not implemented or restricted to only a few commands).
This means your only option with FTP is to download the file and search it locally, i.e. transfer the file to the local system, open it there and look for the string.
Related
My code takes a list of file and folder paths, loops through them, then uploads to a Google Drive. If the path given is a directory, the code creates a .zip file before uploading. Once the upload is complete, I need the code to delete the .zip file that was created but the deletion throws an error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Temp\Newfolder.zip'. The file_path being given is C:\Temp\Newfolder. From what I can tell the only process using the file is this script but with open(...) should be closing the file when the processing is complete. Looking for suggestions on what could be done differently.
import os
import zipfile
import time
def add_list(filePathList, filePath):
filePathList.append(filePath)
return filePathList
def deleteFiles(filePaths):
for path in filePaths:
os.remove(path)
for file_path in file_paths:
if os.path.isdir(file_path) and not os.path.splitext(file_path)[1]:
# Compress the folder into a .zip file
folder_name = os.path.basename(file_path)
zip_file_path = os.path.join('C:\\Temp', folder_name + '.zip')
with zipfile.ZipFile(zip_file_path, 'w') as zip_file:
for root, dirs, files in os.walk(file_path):
for filename in files:
file_to_zip = os.path.join(root, filename)
zip_file.write(file_to_zip, os.path.relpath(file_to_zip, file_path))
zip_file.close()
# Update the file path to the zipped file
file_path = zip_file_path
# Create request body
request_body = {
'name': os.path.basename(file_path),
'mimeType': 'application/zip' if file_path.endswith('.zip') else 'text/plain',
'parents': [parent_id],
'supportsAllDrives': True
}
#Open the file and execute the request
with open(file_path, "rb") as f:
media_file = MediaFileUpload(file_path, mimetype='application/zip' if file_path.endswith('.zip') else 'text/plain')
upload_file = service.files().create(body=request_body, media_body=media_file, supportsAllDrives=True).execute()
# Print the response
print(upload_file)
if file_path.endswith('.zip'):
add_list(filePathList, file_path)
time.sleep(10)
deleteFiles(filePathList)
You're using with statements to properly close the open files, but you're not actually using the open files, just passing the path to another API that is presumably opening the file for you under the hood. Check the documentation for the APIs here:
media_file = MediaFileUpload(file_path, mimetype='application/zip' if file_path.endswith('.zip') else 'text/plain')
upload_file = service.files().create(body=request_body, media_body=media_file, supportsAllDrives=True).execute()
to figure out if MediaFileUpload objects, or the various things done with .create/.execute that uses it, provides some mechanism for ensuring deterministic resource cleanup (as is, if MediaFileUpload opens the file, and nothing inside .create/.execute explicitly closes it, at least one such file is guaranteed to be open when you try to remove them at the end, possibly more than one if reference cycles are involved or you're on an alternate Python interpreter, causing the problem you see on Windows systems). I can't say what might or might not be required, because you don't show the implementation, or specify the package it comes from.
Even if you are careful not to hold open handles yourself, there are cases where other processes can lock it, and you need to retry a few times (virus scanners can cause this issue by attempting to scan the file as you're deleting it; I believe properly implemented there'd be no problem, but they're software written by humans, and therefore some of them will misbehave). It's particularly common for files that are freshly created (virus scanners being innately suspicious of any new file, especially compressed files), and as it happens, you're creating fresh zip files and deleting them in short order, so you may be stuck retrying a few times.
My flask app has a function that compresses log files in a directory into a zip file and then sends the file to the user to download. The compression works, except that when the client receives the zipfile, the zip contains a series of folders that matches the absolute path of the original files that were zipped in the server. However, the zipfile that was made in the server static folder does not.
Zipfile contents in static folder: "log1.bin, log2.bin"
Zipfile contents that was sent to user: "/home/user/server/data/log1.bin, /home/user/server/data/log2.bin"
I don't understand why using "send_file" seems to make this change to the zip file contents and fills the received zip file with sub folders. The actual contents of the received zip file do in fact match the contents of the sent zip file in terms of data, but the user has to click through several directories to get to the files. What am I doing wrong?
#app.route("/download")
def download():
os.chdir(data_dir)
if(os.path.isfile("logs.zip")):
os.remove("logs.zip")
log_dir = os.listdir('.')
log_zip = zipfile.ZipFile('logs.zip', 'w')
for log in log_dir:
log_zip.write(log)
log_zip.close()
return send_file("logs.zip", as_attachment=True)
Using send_from_directory(directory, "logs.zip", as_attachment=True) fixed everything. It appears this call is better for serving up static files.
I was able to successfully upload the file from S3 to SFTP location using the below syntax as given by #Martin Prikryl Transfer file from AWS S3 to SFTP using Boto 3.
with sftp.open('/sftp/path/filename', 'wb') as f:
s3.download_fileobj('mybucket', 'mykey', f)
I have a requirement to archive the previous file into the archive folder from the current folder before uploading the current dated file from S3 to SFTP
I am trying to achieve using the wildcard, because sometimes, when running on Monday, you won't be able to find the file for Sunday and you have the previous file which is Friday's file. So I want to achieve any of the previous file irrespective of the date.
Example
I have folder as below and filename_20200623.csv needs to be moved to ARCHIVE folder and the new file filename_20200625.csv will be uploaded.
MKT
ABC
ARCHIVE
filename_20200623.csv
Expected
MKT
ABC
ARCHIVE
filename_20200623.csv
filename_20200625.csv
Use Connection.listdir_attr to retrieve list of all files in the directory, filter it to those you are interested in, and then move them one-by-one using Connection.rename:
remote_path = "/remote/path"
archive_path = "/archive/path"
for f in sftp.listdir_attr(remote_path):
if (not stat.S_ISDIR(f.st_mode)) and f.filename.startswith('prefix'):
remote_file_path = remote_path + "/" + f.filename
archive_file_path = archive_path + "/" + f.filename
print("Archiving %s to %s" % (remote_file_path, archive_file_path))
sftp.rename(remote_file_path, archive_file_path)
For future readers, who use Paramiko, the code will be identical, except of course that sftp will refer to Paramiko SFTPClient class, instead of pysftp Connection class. As Paramiko SFTPClient.listdir_attr and SFTPClient.rename methods behave identically to those of pysftp.
I am trying to add FTP functionality into my Python 3 script using only ftplib or other libraries that are included with Python. The script needs to delete a directory from an FTP server in order to remove an active web page from our website.
The problem is that I cannot find a way to delete the .htaccess file using ftplib and I can't delete the directory because it is not empty.
Some people have said that this is a hidden file, and have explained how to list hidden files, but I need to delete the file, not list it. My .htaccess file also has full permissions and it can be successfully deleted using most other FTP clients.
Sample code:
files = list(ftp.nlst(myDirectory))
for f in files:
ftp.delete(f)
ftp.rmd(myDirectory)
Update: I was able to get everything working correctly, here is the complete code:
ftp.cwd(myDirectory) # move to the dir to be deleted
#upload placeholder .htaccess in case there is none in the dir and then delete it
files01 = "c:\\files\\.htaccess"
with open(files01, 'rb') as f:
ftp.storlines('STOR %s' % '.htaccess', f)
ftp.delete(".htaccess")
print("Successfully deleted .htaccess file in " + myDirectory)
files = list(ftp.nlst(myDirectory)) # delete files in dir
for f in files:
ftp.delete(f)
print("Successfully deleted visible files in " + myDirectory)
ftp.rmd(myDirectory) # remote directory deletion
print("Successfully deleted the following directory: " + myDirectory)
I have connected to a FTP and the connection is successful.
import ftplib
ftp = ftplib.FTP('***', '****','****')
listoffiles = ftp.dir()
print (listoffiles)
I have a few CSV files in this FTP and a few folders which contain some more CSV's.
I need to identify the list of folders in this location (home) and need to navigate into the folders. I think cwd command should work.
I also read the CSV stored in this FTP. How can I do that? Is there a way to directly load the CSV's here into Pandas?
Based on the answer here (Python write create file directly in FTP) and my own knowledge about ftplib:
What you can do is the following:
from ftplib import FTP
import io, pandas
session = FTP('***', '****','****')
# get filenames on ftp home/root
remoteFilenames = session.nlst()
if ".." in remoteFilenames:
remoteFilenames.remove("..")
if "." in remoteFilenames:
remoteFilenames.remove(".")
# iterate over filenames and check which ones are folder
for filename in remoteFilenames:
dirTest = session.nlst(filename)
# This dir test does not work on certain servers
if dirTest and len(dirTest) > 1:
# its a directory => go to directory
session.cwd(filename)
# get filename for on ftp one level deeper
remoteFilenames2 = session.nlst()
if ".." in remoteFilenames2:
remoteFilenames2.remove("..")
if "." in remoteFilenames2:
remoteFilenames2.remove(".")
for filename in remoteFilenames2:
# check again if the filename is a directory and this time ignore it in this case
dirTest = session.nlst(filename)
if dirTest and len(dirTest) > 1:
continue
# download the file but first create a virtual file object for it
download_file = io.BytesIO()
session.retrbinary("RETR {}".format(filename), download_file.write)
download_file.seek(0) # after writing go back to the start of the virtual file
pandas.read_csv(download_file) # read virtual file into pandas
##########
# do your thing with pandas here
##########
download_file.close() # close virtual file
session.quit() # close the ftp session
Alternatively if you know the structure of the ftpserver you could loop over a dictionary with the folder/file structure and download the files via ftplib or urllib like in the example:
for folder in {"folder1": ["file1", "file2"], "folder2": ["file1"]}:
for file in folder:
path = "/{}/{}".format(folder, file)
##########
# specific ftp file download stuff here
##########
##########
# do your thing with pandas here
##########
Both solution can be optimized by making them recursive or in general support more than one level of folders
Better late than never... I was able to read directly into pandas. Not sure if this works for anyone.
import pandas as pd
from ftplib import FTP
ftp = FTP('ftp.[domain].com') # you need to put in your correct ftp domain
ftp.login() # i don't need login info for my ftp
ftp.cwd('[Directory]') # change directory to where the file is
df = pd.read_csv("[file.csv]", delimiter = "|", encoding='latin1') # i needed to specify delimiter and encoding
df.head()