I use Amazon S3 for storage for my resources, but sometimes I find its necessary to open a file that's stored on S3, in order to do some operations on it.
Is it at all possible (and advisable) to open the files directly from S3, or should I just stick to using a temporary "scratch" folder?
Right now I am using the boto extensions for interfacing with Amazon.
It's not possible to open a file on S3, you can only read them or add/replace them over the network.
There is an open source command line tool called s3fs which emulates mounting an s3 bucket as a user space file system. With it mounted like this you can use any commands that you would use on ordinary files in your file system to open read and write to a file, but behind the scenes it is doing some local caching for all your writes and then uploading the file when you close the handle.
Related
I have an application that is hosted through Google App Engine. It is intended to be a file hosting application, where files are uploaded directly to GCS. However, there is some processing that needs to happen with these files, so originally my plan was to download the files, do the modifications, then reupload. Unfortunately, GAE is a read-only file system. What would be the proper way to make file modifications to objects in GCS from GAE? I am unfamiliar with most google cloud services, but I see ones such as google-cloud-dataproc, would these be able to do it?
Operations are removing lines from files, and combining files into a single .zip
You can store the file in the tmpfs partition that you have on App Engine mounted in /tmp. It's in memory file system and you will use memory to store files. If the files are too large, increase the memory size of your App Engine instance else you will have a out of memory error
If the file is too big, you have to use another product.
Think to clean the files after use to free memory space.
I am following this documentation to download files from EFS
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/efs.html
I have read through the whole documentation and could not figure out a way to download files. Only possible way is that using: generate_presigned_url().
However, the documentation for this part is very limited. I have tried so many times but got stuck. Any suggestion ? Thanks.
The EFS creates a filesystem for you. For example, if you are using linux, it will be available as a NFS share which you can access as a regular file system:
Mounting EFS file systems
Then you just use your regular python or operating system tools to operated on the files stored in the EFS filesystem.
For example, in python you can use shutil to copy or movie files into and out of your EFS mounted filesystem.
Boto3's interface to EFS is only for its management, not for working with files stored on a EFS filesystem.
I have a python program that ultimately writes a csv file using pandas:
df.to_csv(r'path\file.csv)
I was able to upload the files to the server via FileZilla and was also able to run the program on the EC2 server normally. However, I would now like to export a csv file to my local machine, but I don't know how to.
Do I have to write the csv file directly to a cloud drive (e.g. google drive via Pydrive)? What would be the easiest way?
You probably do not want to expose your computer to the dangers of the Internet. Therefore, it is better for your computer to 'pull' the data down, rather than allowing something on the Internet to 'push' something to it.
You could send the data to Amazon S3 or, if you are using a cloud-based storage service like Google Drive or Dropbox, use their SDK to upload the file to their storage.
I am writing an python application which reads/parses a file of this this kind.
myalerts.ini,
value1=1
value2=3
value3=10
value4=15
Currently I store this file in local filesystem. If I need to change this file I need to have physical access to this computer.
I want to move this file to cloud so that I can change this file anywhere (another computer or from phone).
If this application is running on some machine I should be able to change this file on the cloud and the application which is running on another machine which I don't have physical access to will be able to read updated file.
Notes,
I am new to both python and aws.
I am currently running it on my local mac/linux and planning on deploying on aws.
There are many options!
Amazon S3: This is the simplest option. Each computer could download the file at regular intervals or just before they run a process. If the file is big, the app could instead check whether the file has changed before downloading.
Amazon Elastic File System (EFS): If your applications are running on multiple Amazon EC2 instances, EFS provides a shared file system that can be mounted on each instance.
Amazon DynamoDB: A NoSQL database instead of a file. Much faster than parsing a file, but less convenient for updating values — you'd need to write a program to update values, eg from the command-line.
AWS Systems Manager Parameter Store: A managed service for storing parameters. Applications (anywhere on the Internet) can request and update parameters. A great way to configure cloud-based application!
If you are looking for minimal change and you want it accessible from anywhere on the Internet, Amazon S3 is the easiest choice.
Whichever way you go, you'll use the boto3 AWS SDK for Python.
Is it possible to create a new excel spreadsheet file and save it to an Amazon S3 bucket without first saving to a local filesystem?
For example, I have a Ruby on Rails web application which now generates Excel spreadsheets using the write_xlsx gem and saving it to the server's local file system. Internally, it looks like the gem is using Ruby's IO.copy_stream when it saves the spreadsheet. I'm not sure this will work if moving to Heroku and S3.
Has anyone done this before using Ruby or even Python?
I found this earlier question, Heroku + ephemeral filesystem + AWS S3. So, it would seem this is not possible using Heroku. Theoretically, it would be possible using a service which allows adding an Amazon EBS.
You have dedicated Ruby Gem to help you moving file to Amazon S3:
https://rubygems.org/gems/aws-s3
If you want more details about the implementation, here is the git repository. The documentation on the page is very complete, and explain how to move file to S3. Hope it helps.
Once your xls file is created, the library helps you create a S3Objects and store it into a Bucket (which you can also create with the library).
S3Object.store('keyOfYourData', open('nameOfExcelFile.xls'), 'bucketName')
If you want more choice, Amazon also delivered an official Gem for this purpose: https://rubygems.org/gems/aws-sdk