I am following this documentation to download files from EFS
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/efs.html
I have read through the whole documentation and could not figure out a way to download files. Only possible way is that using: generate_presigned_url().
However, the documentation for this part is very limited. I have tried so many times but got stuck. Any suggestion ? Thanks.
The EFS creates a filesystem for you. For example, if you are using linux, it will be available as a NFS share which you can access as a regular file system:
Mounting EFS file systems
Then you just use your regular python or operating system tools to operated on the files stored in the EFS filesystem.
For example, in python you can use shutil to copy or movie files into and out of your EFS mounted filesystem.
Boto3's interface to EFS is only for its management, not for working with files stored on a EFS filesystem.
Related
What is the best method to grab files from a Windows shared folder on the same network?
Typically, I am extracting data from SFTPs, SalesForce, or database tables, but there are a few cases where end-users need to upload a file to a shared folder that I have to retrieve. My process up to now has been to have a script running on a Windows machine which just grabs any new/changed files and loads them to an SFTP, but that is not ideal. I can't monitor it in my Airflow UI, I need to change my password on that machine physically, mapped network drives seem to break, etc.
Is there a better method? I'd rather the ETL server handle all of this stuff.
Airflow is installed on remote Linux server (same network)
Windows folders are just standard UNC paths where people have access based on their NT ID. These users are saving files which I need to retrieve. These users are non-technical and did not want WinSCP installed to share the data through an SFTP instead or even a Sharepoint (where I could use Shareplum, I think).
I would like to avoid mounting these folders and instead use Python scripts to simply copy the files I need as per an Airflow schedule
Best if I can save my NT ID and password within an Airflow connection to access it with a conn_id
If I'm understanding the question correctly, you have a shared folder mounted on your local machine — not the Windows server where your Airflow install is running. Is it possible to access the shared folder on the server instead?
I think a file sensor would work your use case.
If you could auto sync the shared folder to a cloud file store like S3, then you could use the normal S3KeySensor and S3PrefixSensor that are commonly used . I think this would simplify your solution as you wouldn't have to be concerned with whether the machine(s) the tasks are running on has access to the folder.
Here are two examples of software that syncs a local folder on Windows to S3. Note that I haven't used either of them personally.
https://www.cloudberrylab.com/blog/how-to-sync-local-folder-with-amazon-s3-bucket-with-cloudberry-s3-explorer/
https://s3browser.com/amazon-s3-folder-sync.aspx
That said, I do think using FTPHook.retrieve_file is a reasonable solution if you can't have your files in cloud storage.
In AWS, a similar functionality exists using awscli as explained here. Does there exist a similar functionality in Azure using Python SDK or CLI? Thanks.
There are two services Blob Storage & File Storage in Azure Storage, but I don't know which one of Azure Storage services is you want to be synchronised with a folder and what OS you used is.
As #Gaurav Mantri said, Azure File Sync is a good idea if you want to synchronise a folder with Azure File Share on your on-premise Windows Server.
However, if you want to synchronise Azure Blobs or some Unix-like OS you used like Linux/MacOS, I think you can try to use Azure Storage Fuse for Blob Storage or Samba client for File Storage with rsync command to achieve your needs.
First of all, the key point of the workaround solution is to mount the File/Blob service of Azure Storage as a local filesystem, then you can operate it in Python/Other ways as same as on local, as below.
For how to mount blob container as fs, to follow the installation instructions to install blobfuse, then to configure & run the necessary file/script to mount a blob container of Azure Storage account as the wiki page said.
For how to mount a file share with samba clint, please refer to the offical document Use Azure Files with Linux.
Then, you can directly operate all data in the filesystem of blobfuse mounted or samba mounted, or to do the folder synchronisation with rsync & inotify command, or to do other operations if you want.
Hope it helps. Any concern, please feel free to let me know.
I need to copy a file from a distributed storage network to my linux machine. I only get the UNC path to the file from a database. There are about 20 servers with more than 30 shares on each server, so mounting them local with samba is not a good option.
i tried open(r'\\filestore01\share01\00\000001', 'r') on a Windows machine, which works, but not on linux. I also found Python Linux-copy files to windows shared drive (samba) which will again mount before copy...
is that the only solution available? is there no native lib to copy from a windows share?
The best solution to the problem, to avoid mounting things everywhere, is to use libraries like pysmb:
https://pythonhosted.org/pysmb/api/smb_SMBConnection.html
Probably the best idea would be to write a wrapper for such URLs.
I use Amazon S3 for storage for my resources, but sometimes I find its necessary to open a file that's stored on S3, in order to do some operations on it.
Is it at all possible (and advisable) to open the files directly from S3, or should I just stick to using a temporary "scratch" folder?
Right now I am using the boto extensions for interfacing with Amazon.
It's not possible to open a file on S3, you can only read them or add/replace them over the network.
There is an open source command line tool called s3fs which emulates mounting an s3 bucket as a user space file system. With it mounted like this you can use any commands that you would use on ordinary files in your file system to open read and write to a file, but behind the scenes it is doing some local caching for all your writes and then uploading the file when you close the handle.
Is there any way to view the contents of a vmdk file from Python, and to be able to read files from it? (I have no need to write to it). If not, is there any way to mount a vmdk file on a host machine, or generally any other way to look at a vmdk file without attaching it to a VM and running it?
You can mount a VMDK as a local disk with Disk Mount Utility.
You may want to take a look at ctypes-vddk if you are looking to import modules for exfiltration of vmdk data through python. You can find the module here; http://code.google.com/p/ctypes-vddk/
Personally, if you are looking to leverage the VDDK API (via C++), you can use Virtual Disk Development Kit 5.5 and its corresponding API. The actually programming guide can also be found here: hxxp://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vddk55_programming.pdf. Additionally, there is a tool that implemented this which can be found here: http://sourceforge.net/projects/vfae/. Lastly, there was a writeup on the use of VDDK with regard to VMDK forensic analysis: hxxp://crucialsecurityblog.harris.com/2012/01/18/how-can-vmwares-virtual-disk-development-kit-help-the-forensic-examiner/
enjoy...