I have code on aws ec2. Right now, it accepts input and output files from s3. Its an inefficient process. I have to upload input file to s3, copy s3 to ec2, run program, copy output files from ec2 to s3, then download locally.
Is there a way to run the code on ec2 and accept a local file as input and then have the output saved on my local machine?
It appears that your scenario is:
Some software on an Amazon EC2 instance is used to process data on the local disk
You are manually transferring that data to/from the instance via Amazon S3
An Amazon EC2 instance is just like any other computer. It runs the same operating system and the same software as you would on a server in your company. However, it does benefit from being in the cloud in that it has easy access to other services (such as Amazon S3) and resources can be turned off to save expense.
Optimize current process
In sticking with the current process, you could improve it with some simple automation:
Upload your data to Amazon S3 via an AWS Command-Line Interface (CLI) command, such as: aws s3 cp file.txt s3://my-bucket/input/
Execute a script on the EC2 process that will:
Download the file, eg: aws s3 cp s3://my-bucket/input/file.txt .
Process the file
Copy the results to S3, eg: aws s3 cp file.txt s3://my-bucket/output/
Download the results to your own computer, eg: aws s3 cp s3://my-bucket/output/file.txt .
Use scp to copy files
Assuming that you are connect to a Linux instance, you could automate via:
Use scp to copy the file to the EC2 instance (which is very similar to the SSH command)
Use ssh with a [remote command(https://malcontentcomics.com/systemsboy/2006/07/send-remote-commands-via-ssh.html) parameter to trigger the remote process
Use scp to copy the file down once complete
Re-architect to use AWS Lambda
If the job that runs on the data is suitable for being run as an AWS Lambda function, then the flow would be:
Upload the data to Amazon S3
This automatically triggers the Lambda function, which processes the data and stores the result
Download the result from Amazon S3
Please note that an AWS Lambda function runs for a maximum of 15 minutes and has a limit of 512MB of temporary disk space. (This can be expanded by using Amazon EFS is needed.)
Something in-between
There are other ways to upload/download data, such as running a web server on the EC2 instance and interacting via a web browser, or using AWS Systems Manager Run Command to trigger the process on the EC2 instance. Such a choice would be based on how much you are permitted to modify what is running on the instance and your technical capabilities.
#John Rotenstein we have solved the problem of loading 60MB+ models into Lambdas by attaching AWS EFS volumes via VPC. Also solves the problem with large libs such as Tensorflow, opencv etc. Basically lambda layers almost become redundant and you can really sit back and relax, this saved us days if not weeks of tweaking, building and cherry picking library components from source allowing us to concentrate on the real problem. Beats loading from S3 everytime too. The EFS approach would require an ec2 instance obviously.
Related
I am new to get AWS CLI working, and finally have my commands working through gitBash with:
aws s3 ls --no-verify-ssl
I am now trying to run the same commands from Python.
I need to be able to do the following tasks in AWS s3 from Python:
Copy hundreds of local folders to the s3 bucket.
Update existing folders on the s3 bucket with changes made on local versions.
List contents of the s3 bucket.
In reading similar posts here, I see that --no-verify-ssl means there is a bigger problem, however using it is the way our network people have set things up, and I have no control over that. This is the flag they require to be used to allow access to the AWS CLI.
I have tried using boto3 and running the Python command there, but I get an authentication error because I don't know how to pass the --no-verify-ssl flag from Python.
I am an absolute beginner in AWS: I have created a key and an instance, the python script I want to run in the EC2 environment needs to loop through around 80,000 filings, tokenize the sentences in them, and use these sentences for some unsupervised learning.
This might be a duplicate; but I can't find a way to copy these filings to the EC2 environment and run the python script in EC2, I am also not very sure as to how I can use boto3. I am using Mac OS. I am just looking for any way to speed things up. Thank you so so much! I am forever grateful!!!
Here's what I tried recently:
Create the bucket and keep the bucket accessible for public.
Create the role and add HTTP option.
Upload all the files and make sure the files are public accessible.
Get the HTTP link of the S3 file.
Connect the instance through putty.
wget copies the file into EC2
instance.
If your files are in zip format, one time copy enough to move all the files into instance.
Here's one way that might help:
create a simple IAM role that allows S3 access to the bucket holding your files
apply that IAM role to the running EC2 instance (or launch a new instance with the IAM role)
install the awscli on the EC2 instance
SSH to the instance and sync the S3 files to the EC2 instance using aws s3 sync
run your app
I'm assuming you've launched EC2 with enough diskspace to hold the files.
I have a python program that ultimately writes a csv file using pandas:
df.to_csv(r'path\file.csv)
I was able to upload the files to the server via FileZilla and was also able to run the program on the EC2 server normally. However, I would now like to export a csv file to my local machine, but I don't know how to.
Do I have to write the csv file directly to a cloud drive (e.g. google drive via Pydrive)? What would be the easiest way?
You probably do not want to expose your computer to the dangers of the Internet. Therefore, it is better for your computer to 'pull' the data down, rather than allowing something on the Internet to 'push' something to it.
You could send the data to Amazon S3 or, if you are using a cloud-based storage service like Google Drive or Dropbox, use their SDK to upload the file to their storage.
I am writing an python application which reads/parses a file of this this kind.
myalerts.ini,
value1=1
value2=3
value3=10
value4=15
Currently I store this file in local filesystem. If I need to change this file I need to have physical access to this computer.
I want to move this file to cloud so that I can change this file anywhere (another computer or from phone).
If this application is running on some machine I should be able to change this file on the cloud and the application which is running on another machine which I don't have physical access to will be able to read updated file.
Notes,
I am new to both python and aws.
I am currently running it on my local mac/linux and planning on deploying on aws.
There are many options!
Amazon S3: This is the simplest option. Each computer could download the file at regular intervals or just before they run a process. If the file is big, the app could instead check whether the file has changed before downloading.
Amazon Elastic File System (EFS): If your applications are running on multiple Amazon EC2 instances, EFS provides a shared file system that can be mounted on each instance.
Amazon DynamoDB: A NoSQL database instead of a file. Much faster than parsing a file, but less convenient for updating values — you'd need to write a program to update values, eg from the command-line.
AWS Systems Manager Parameter Store: A managed service for storing parameters. Applications (anywhere on the Internet) can request and update parameters. A great way to configure cloud-based application!
If you are looking for minimal change and you want it accessible from anywhere on the Internet, Amazon S3 is the easiest choice.
Whichever way you go, you'll use the boto3 AWS SDK for Python.
I have a CSV which is stored in an AWS S3 bucket and is used to store information which gets loaded into a HTML document via some jQuery.
I also have a Python script which is currently sat on my local machine ready to be used. This Python script scrapes another website and saves the information to the CSV file which I then upload to my AWS S3 bucket.
I am trying to figure out a way that I can have the Python script run nightly and overwrite the CSV stored in the S3 bucket. I cannot seem to find a similar solution to my problem online and am vastly out of my depth when it comes to AWS.
Does anyone have any solutions to this problem?
Cheapest way: Modify your Python script to work as an AWS Lambda function, then schedule it to run nightly.
Easiest way: Spin up an EC2 instance, copy the script to the instance, and schedule it to run nightly via cron.