Zip a folder on S3 using Django - python

I have an application where in I need to zip folders hosted on S3.The zipping process will be triggered from the model save method.The Django app is running on an EC2 instance.Any thoughts or leads on the same?
I tried django_storages but haven't got a breakthrough

from my understanding you can't zip files directly on s3. you would have to download the files you'd want to zip, zip them up, then upload the zipped file. i've done something similar before and used s3cmd to keep a local synced copy of my s3bucket, and since you're on an ec2 instance network speed and latency will be pretty good.

Related

How to run code on aws ec2 with local input and output

I have code on aws ec2. Right now, it accepts input and output files from s3. Its an inefficient process. I have to upload input file to s3, copy s3 to ec2, run program, copy output files from ec2 to s3, then download locally.
Is there a way to run the code on ec2 and accept a local file as input and then have the output saved on my local machine?
It appears that your scenario is:
Some software on an Amazon EC2 instance is used to process data on the local disk
You are manually transferring that data to/from the instance via Amazon S3
An Amazon EC2 instance is just like any other computer. It runs the same operating system and the same software as you would on a server in your company. However, it does benefit from being in the cloud in that it has easy access to other services (such as Amazon S3) and resources can be turned off to save expense.
Optimize current process
In sticking with the current process, you could improve it with some simple automation:
Upload your data to Amazon S3 via an AWS Command-Line Interface (CLI) command, such as: aws s3 cp file.txt s3://my-bucket/input/
Execute a script on the EC2 process that will:
Download the file, eg: aws s3 cp s3://my-bucket/input/file.txt .
Process the file
Copy the results to S3, eg: aws s3 cp file.txt s3://my-bucket/output/
Download the results to your own computer, eg: aws s3 cp s3://my-bucket/output/file.txt .
Use scp to copy files
Assuming that you are connect to a Linux instance, you could automate via:
Use scp to copy the file to the EC2 instance (which is very similar to the SSH command)
Use ssh with a [remote command(https://malcontentcomics.com/systemsboy/2006/07/send-remote-commands-via-ssh.html) parameter to trigger the remote process
Use scp to copy the file down once complete
Re-architect to use AWS Lambda
If the job that runs on the data is suitable for being run as an AWS Lambda function, then the flow would be:
Upload the data to Amazon S3
This automatically triggers the Lambda function, which processes the data and stores the result
Download the result from Amazon S3
Please note that an AWS Lambda function runs for a maximum of 15 minutes and has a limit of 512MB of temporary disk space. (This can be expanded by using Amazon EFS is needed.)
Something in-between
There are other ways to upload/download data, such as running a web server on the EC2 instance and interacting via a web browser, or using AWS Systems Manager Run Command to trigger the process on the EC2 instance. Such a choice would be based on how much you are permitted to modify what is running on the instance and your technical capabilities.
#John Rotenstein we have solved the problem of loading 60MB+ models into Lambdas by attaching AWS EFS volumes via VPC. Also solves the problem with large libs such as Tensorflow, opencv etc. Basically lambda layers almost become redundant and you can really sit back and relax, this saved us days if not weeks of tweaking, building and cherry picking library components from source allowing us to concentrate on the real problem. Beats loading from S3 everytime too. The EFS approach would require an ec2 instance obviously.

Run Python Script on AWS and transfer 5GB of files to EC2

I am an absolute beginner in AWS: I have created a key and an instance, the python script I want to run in the EC2 environment needs to loop through around 80,000 filings, tokenize the sentences in them, and use these sentences for some unsupervised learning.
This might be a duplicate; but I can't find a way to copy these filings to the EC2 environment and run the python script in EC2, I am also not very sure as to how I can use boto3. I am using Mac OS. I am just looking for any way to speed things up. Thank you so so much! I am forever grateful!!!
Here's what I tried recently:
Create the bucket and keep the bucket accessible for public.
Create the role and add HTTP option.
Upload all the files and make sure the files are public accessible.
Get the HTTP link of the S3 file.
Connect the instance through putty.
wget copies the file into EC2
instance.
If your files are in zip format, one time copy enough to move all the files into instance.
Here's one way that might help:
create a simple IAM role that allows S3 access to the bucket holding your files
apply that IAM role to the running EC2 instance (or launch a new instance with the IAM role)
install the awscli on the EC2 instance
SSH to the instance and sync the S3 files to the EC2 instance using aws s3 sync
run your app
I'm assuming you've launched EC2 with enough diskspace to hold the files.

Python and Dropbox: How to change SmartSync setting for locally created files?

I have a working Python script that generates and saves hi-res image files to a local Dropbox folder (synced through the Windows Dropbox app). Is there a way in Python to change the SmartSync setting for the newly created image from "Local" to "Online Only" so that I can save space on my local hard drive? I know I could use the Dropbox API v2 to just upload the file and then delete the temporary local files after, but I'm wondering if there is a way to directly change the file settings since it already gets saved to the synced Dropbox folder.
Thanks!
No, unfortunately Dropbox doesn't offer an API for managing Smart Sync settings like this, but I'll pass this along as a feature request.

How can I update a CSV stored on AWS S3 with a Python script?

I have a CSV which is stored in an AWS S3 bucket and is used to store information which gets loaded into a HTML document via some jQuery.
I also have a Python script which is currently sat on my local machine ready to be used. This Python script scrapes another website and saves the information to the CSV file which I then upload to my AWS S3 bucket.
I am trying to figure out a way that I can have the Python script run nightly and overwrite the CSV stored in the S3 bucket. I cannot seem to find a similar solution to my problem online and am vastly out of my depth when it comes to AWS.
Does anyone have any solutions to this problem?
Cheapest way: Modify your Python script to work as an AWS Lambda function, then schedule it to run nightly.
Easiest way: Spin up an EC2 instance, copy the script to the instance, and schedule it to run nightly via cron.

How to save excel file to amazon s3 from python or ruby

Is it possible to create a new excel spreadsheet file and save it to an Amazon S3 bucket without first saving to a local filesystem?
For example, I have a Ruby on Rails web application which now generates Excel spreadsheets using the write_xlsx gem and saving it to the server's local file system. Internally, it looks like the gem is using Ruby's IO.copy_stream when it saves the spreadsheet. I'm not sure this will work if moving to Heroku and S3.
Has anyone done this before using Ruby or even Python?
I found this earlier question, Heroku + ephemeral filesystem + AWS S3. So, it would seem this is not possible using Heroku. Theoretically, it would be possible using a service which allows adding an Amazon EBS.
You have dedicated Ruby Gem to help you moving file to Amazon S3:
https://rubygems.org/gems/aws-s3
If you want more details about the implementation, here is the git repository. The documentation on the page is very complete, and explain how to move file to S3. Hope it helps.
Once your xls file is created, the library helps you create a S3Objects and store it into a Bucket (which you can also create with the library).
S3Object.store('keyOfYourData', open('nameOfExcelFile.xls'), 'bucketName')
If you want more choice, Amazon also delivered an official Gem for this purpose: https://rubygems.org/gems/aws-sdk

Categories