How to run python file in AWS S3 bucket from EC2? - python

I tried run python file from AWS S3 storage like
python s3://test-bucket/test/py_s3_test.py
I'm getting Error :
python: can't open file 's3://test-bucket/test/py_s3_test.py': [Errno 2] No such file or directory
Is there anyway to run python file resides in AWS S3.
Thank you.

Try this one, it will work.
aws s3 cp s3://yourbucket/path/to/file/hello.py - | python
Explanation: Its downloading the file from S3 and then passing stream to python for execution.
Alternatively, you could split it into multiple steps as well like download the file, save it to any local file and execute the locally saved file.
Hope it helps!

Related

Reading .mdb or .accdb file from s3 bucket in AWS lambda function and converting it into excel or csv using python

I have a use case where I need to read tables from MS access file (.mdb or .accdb) which is placed on AWS s3 bucket and converting it into csv or excel file in AWS lambda function and again upload the converted file to s3 bucket.
I got the ways through pyodbc library but it's not working on AWS cloud especially when the file is placed on s3 bucket.
That's because the S3 bucket isn't an SMB file share.
Download the database file to a SMB file server - a server running Windows Server or Linux with Samba - and access the file at that location.

config.yml not found on Databricks

I have python project which queries the SQL Server database and does some transformation within the SQL server. This project is using using config.yml which has all the DB related properties.
Now, I'am trying to host this on databricks and so that I can run it as notebook. I have all the python files imported to the databricks workspace. But while executing the main .py file I get the following error
FileNotFoundError: [Errno 2] No such file or directory: 'config.yml'
Because Databricks does not allow me to import a .yml file into the work space. What can I do to run this python project so that it read the .yml file and create a DB connection properly.
Thanks!
You can put your .yaml file to the DBFS, and point to it. You can do it different ways:
Using dbutils.fs.put (see doc)
Using Databricks CLI's databricks fs cp command from your local machine - you will need to install on it databricks-cli python package, and configure to use personal access tokens if they are enabled in your workspace (see doc)
Upload file via file browser, or directly from the notebook's menu (see doc)
Because your code works with "local" files, you will need to specify the path to the file as /dbfs/<file-path-on-dbfs - in this case, file will be read by "normal" Python's file API.

how to unzip the file from s3 using EC2?

I have 150 GB file in s3,I would like to unzip and upload the each file back to s3.what is the best approach to do with python and EC2 ? I appreciate your response.
Download it on your system
Unzip the folder and it will create a normal folder say "Unzipped_Folder"
Assuming you are using Windows, install aws-cli in that.
Create an IAM User with S3 write access to that bucket and create a Access and Secret Access Key.
In aws-cli add the credentials from command prompt.
$ aws configure
Now run the following command to send files to S3 bucket
$ aws s3 cp your_directory_path s3://your_bucket_name --recursive

How can I download encrypted .gz file from AWS S3 bucket

I backup a nextcloud instance, compress and encrypt it and the I store it in an S3 bucket. When I enter the bucket I have a file called backup.tar.gz. Great.
Now I download the file form the bucket and I get a .tar file, which obviously renders my backup completely useless. I can't decrypt it. If I manually save the file with the extension .gz it still doesn't decrypt but says tar: Error opening archive: Unrecognized archive format.
When I try to download it via the CLI it doesn't download the file but says usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]:
aws s3 cp s3://mybucket/myfile.tar.gz
The problem is mentioned also here: https://forums.aws.amazon.com/thread.jspa?threadID=250926 where the only answer is, "why not use AWS snapshot to backup your EC2 instance".
I don't get why AWS would do that or is this a bug? Is there any decent way to get my backup from my bucket in the format I specify?
I could also do this, if necessary with a python script but IDK if that would work any better?
Help is very much appreciated! Thanks in advance!
I encrypt like this:
tar -czf - /path/to/file | openssl enc -e -aes256 -pbkdf2 -salt -out backup.tar.gz -pass pass:$ENCRYPTION_PASS
You can download a file from S3 using the awscli as follows:
aws s3 cp s3://mybucket/myfile.tar.gz .
Note the trailing period which means 'to the current folder'.
Or you could rename the file during download, as follows:
aws s3 cp s3://mybucket/myfile.tar.gz xyz.tar.gz
It's possible, I suppose, that if you download a tar.gz using a browser or other tool (not the awscli) then that tool might try to be helpful and uncompress the GZ file for you in flight, resulting in a TAR file. But the awscli won't do that natively.

Zipping the files in S3

I am having some text files in S3 location. I am trying to compress and zip each text files in it. I was able to zip and compress it in Jupyter notebook by selecting the file from my local. While trying the same code in S3, its throwing error as file is missing. Could someone please help
Amazon S3 does not have a zip/compress function.
You will need to download the files, zip them on an Amazon EC2 instance or your own computer, then upload the result.

Categories