I am having some text files in S3 location. I am trying to compress and zip each text files in it. I was able to zip and compress it in Jupyter notebook by selecting the file from my local. While trying the same code in S3, its throwing error as file is missing. Could someone please help
Amazon S3 does not have a zip/compress function.
You will need to download the files, zip them on an Amazon EC2 instance or your own computer, then upload the result.
Related
I have been trying to read SAS (sas7bdat) files from AWS S3 with Glue Job. For this, I found a lib to read the files (https://github.com/openpharma/sas7bdat). However, when I try to read these files, my job doesnt find the directory. But the directory exists, and the file is inside it.
When I check the logs, it looks like to be related to JAVA/JAR. I am a beginner with AWS and SAS files either. How could I read SAS files with Glue? There's a easier way?
Previously I was working in AWS and I am new in Google Cloud, in AWS there was a way to upload directories/folder to bucket. I have done bit of research for uploading directory/folder in Google Cloud bucket but couldn't find. Can someone help me.I would like to upload some folders(not files) inside a folder to google cloud using python.How to do that?
To achieve this, you need to upload file by file the content on each directory and replicate the path that you have locally in your GCS bucket.
Note: directory doesn't exist in GCS, it's simply a set of the same file path prefix presented as directory in the UI
I have a json file with over 16k urls of images, which I parse using a python script and use urllib.request.urlretrieve in it to retrieve images. I uploaded the json file to google drive and run the python script in google Colab.
Though the files were downloaded (I checked this using a print line in the try block of urlretrieve) and it took substantial time to download them, I am unable to see where it has stored these files. When I had run the same script on my local machine, it stored the files in the current folder.
As an answer to this question suggests, the files may be downloaded to some temporary location, say, on some cloud. Is there a way to dump these temporary files to google drive?
(*Note I had mounted the drive in the colab notebook, still the files don't appear to be stored in google drive)
Colab stores files in some temp location which is new every time you run the notebook. If you want your data to persist across sessions you need to store it in GDrive. For that you need to map some GDrive folder in your notebook and use it as path. Also, you need to give the Colab permissions to access your GDrive
After mounting GDrive you need to move files from the Colab to GDrive using command:
!mv /content/filename /content/gdrive/My\ Drive/
I tried run python file from AWS S3 storage like
python s3://test-bucket/test/py_s3_test.py
I'm getting Error :
python: can't open file 's3://test-bucket/test/py_s3_test.py': [Errno 2] No such file or directory
Is there anyway to run python file resides in AWS S3.
Thank you.
Try this one, it will work.
aws s3 cp s3://yourbucket/path/to/file/hello.py - | python
Explanation: Its downloading the file from S3 and then passing stream to python for execution.
Alternatively, you could split it into multiple steps as well like download the file, save it to any local file and execute the locally saved file.
Hope it helps!
I use Amazon S3 for storage for my resources, but sometimes I find its necessary to open a file that's stored on S3, in order to do some operations on it.
Is it at all possible (and advisable) to open the files directly from S3, or should I just stick to using a temporary "scratch" folder?
Right now I am using the boto extensions for interfacing with Amazon.
It's not possible to open a file on S3, you can only read them or add/replace them over the network.
There is an open source command line tool called s3fs which emulates mounting an s3 bucket as a user space file system. With it mounted like this you can use any commands that you would use on ordinary files in your file system to open read and write to a file, but behind the scenes it is doing some local caching for all your writes and then uploading the file when you close the handle.